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CHAPTER I 



imODUCTION 

This paper considers some of the implications of the assumption 
that items in a list mutually affect each other in the course of verbal 
list -learning. By a mutual affect or item interaction (or item dependency) 
is meant that performance on a particular S-R pair in a list depends in 
some way on the number and order of presentations of other S-R pairs in 
the list. It is hardly necessary to document the fact that items do 
interact in this sense; other things being equal, more errors are made to 
a particular S-R pair the larger the number of other S-R pairs in 
the list. Of course, these item interactions may be of a mild and uni- 
form sort, such as might be produced by the subject's spreading his effort 
over M items, rather than just one; or the interactions might be more 
extreme and non-unifomn, such as those postulated by a concept-identifi- 
cation model (cf. Restle, 1961). We open the analysis by drawing some 
conclusions from a brief review of the history of mathematical learning 
models for verbal list-learning. 

Probably the simplest mathematical model for verbal list-learning 
is the one-element pattern model (Estes, 1959 ) • This model was first 
analy.’<ed in depth and applied to paired-associate learning data by Bower 
(i960, 1961). Since its introduction, the one-element model has received 
a number of diverse interpretations; among these are the followings (l) a 
stimulus pattern interpretation (Estes, 1959 )^ ( 2 ) an all-or-none strategy- 
selection (hypothesis) interpretation (Restle, 1961, 196 k), ( 3 ) a memory 
interpretation (Atkinson, Bower, and Crothers, 196 ^, PP» 87-88), and 
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(4) a response-elimination interpretation (Millward, 1964) . Although the 

model has heen successful in accounting for some list-learning data, a 

number of deficiencies in the model have heen pointed out. Some of these 

are as follows: (l) the model fails to account for individual differences 

and unequal item difficulty (Postman, 1963 (2) learning may involve more 

than one stage (Restle, 1964), and (3) improvement in performance may 

take place prior to the last error (Suppes and Ginsberg, 1963)= 

Despite the ups and downs of the one-element model and its many 

modifications and extensions, the basic research strategy depicted in 

Bower (l96l) has had a great influence on later invention and application 

of models to paired-associate data. This strategy has been first to 

state a (new) mathematical model for paired-associate learning (usually 

some finite-state Markov model), derive a battery of statistics from this 

model, estimate parameters in the model, and then attempt to account for 

1 

summary statistics of the pool of subject- item error- success sequences 
o]?tained in a list-learning experiment (usually run by the anticipation 
procedure) designed to validate the model <. Among the many research papero 
exhibiting this four-step strategy are Atkinson and Crothers (1964), 

Bower (l96l). Bower and Theois (1964), Calfee and Atkinson (1963)^ Mill- 
ward (1964), Norman (1964)^ Poison, Restle, and Poison (1963)^ and 
Restle (1964). 

A few of the models presented in these references have psychological 
rationals which assume that the learning of a particular S-R pair pre- 
cedes independently of the states and responses of other items in the list 

Suppose a subject learns a list of M items by the anticipation pro- 
cedure. Then that subject contributes M error-success subsequences, 
one for each item, to the pool of subject-item error-success sequences. 
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(e.g., the one-element model (Bower, I96l) and the random-trial increments 
model (Norman, 1964)). For these models it seems reasonable to study 
separate error-success sequences for each subject-item, since the model 
assumes that each of these sequences represents an independent sample 
path from the stochastic process represented in the model. However, for 
a number of other models, this independence of subject-item error-success 
sequences is placed in immediate question by the psychological theory 
postulated to underlie the model which is applied to these sequences. 

For example, the trial-differential-forgetting (T.D.F.) model suggested 
by Atkinson and Crothers (1964) and developed in Calfee and Atkinson ( 1965 ) 
postulates that the more intervening unlearned items between two successive 
presentations of a particular S-R pair, the greater the chance that the 
pair passes out of the short-term memory state and is thus forgotten. 

This assumption very definitely implies that, for a particular subject, 
error-success protocols for each item are not independent. Also, the 
strategy-selection theory of Restle (1964) implies that confusable items 
produce very non-independent error- success protocols, i.e., if S-R^ 
and S'-Rg are two such S-R pairs, the error-success process on each 
should be related, since subjects may confuse S and S' . 

At best, an application of these models to a pool of error-success 
protocols which lack a stimulus tag or a subject tag represents an approx- 
imation to the true state of affairs. When applying the T.D.F. model to 
data (Calfee and Atkinson, 1965 ), it is assumed that the average number 
of unlearned items, F^, intervening between the n^^ and n+1^^ presenta- 
tions of a given item applies to all items in a list. Under this ap- 
proximation, the theory takes the form of a finite-state inhomogeneous 
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Markov chain » This chain is designed to account for the error-success 
protocols for each subject -item in the experiment. The approximation 
that Restle uses to account for subject-item protocols is discussed in 
detail in Chapter 5^ ppol08-117 of this paper. Basically, he neglects 
the interrelationships between a pair of confusable items in his appli- 
cations c 

The major psychological ideas in these latter two theories are as 
follows; (l) the ToD.F. model is based on the idea that unlearned items, 
when they are presented, cause items in short-term memory to be bumped 
into a forgotten states and (2) the strategy-selection theory is based 
on the idea that stimulus confusion (S-R^, S’-R^) is overcome in an 
all-or-none manner. In both cases we have seen that in order to apply 
the theory to a pool of subject-item error-success sequences in an anti- 
cipation procedure experiment, the major new variable in the theory is 
represented as an "average" quantity. However, by their nature, both 
the memory assumption and the confusion assumption imply highly differen- 
tial effects on response probability to a particular S~R pair as a 
function of the number and order of other preceding S-R pairs. The 
implications of these two assumptions can be powerfully tested by either 
designing an experiment where S-R presentations are highly controlled 
or by utilizing statistics in the data that relate performance on sepa- 
rate items (or both possibilities together). Experiments and analyses 
of this nature have been perfoimied on the memory assumption (Bjork, un- 
published doctoral dissertation^ Greeno, 1966; and Atkinson and Shiffrin 
1965 ) and on the stimulus confusion assumption (Restle, 196 ^, PP« 1^5-l60 
Ruskin, unpublished doctoral dissertation; and Sheppard, Hoveland, and 



Jenkins, 1961). Finally, it should he mentioned that although the T.D.F. 
model and strategy-selection theory were singled out as being convenient 
examples of approaches to item dependencies, other models have also 
attempted to handle this problem. 

This paper considers both the mode of data analysis and the method 
of S-R presentation for a number of restricted theoretical assumptions 
involving item interactions in S-R list-learning experiments. Chapter 2 
considers the problem of level of data analysis, i.e., the problem of how 
to use data in a list-learning experiment to bear on a psychological 
theoiy or to evaluate a model. By this concept is meant the followings 
each subject in a list-learning experiment can be conceptualized as emit- 
ting a single finite data sequence. A particular member, x^, of this 
sequence consists of the stimulus presented to the subject on the n^^ 
trial, S^, and his response to that stimulus, A^. Thus, for a given 
subject i, the data are of the form 



(1.1) 




1 1 

- X^Xg 




= S^-A^- . . . S^A^ 

112 2 n n 



® A ' 



where N S-R presentations are given to subject i in the experiment. 

In order to analyze data in a list-learning experiment, researchers trans- 
form this primary datum in ways to extract what they regard as its infor- 
mative aspects. For example, for a subject-item error-success analysis, 
the primary datum is separated into subsequences, one for each item, and 
then the SA terms in these shorter sequences are transcribed as errors 
or successes. The particular way in which the primary datum is reduced 
represents the level of analysis. 
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More specifically, by level of data analysis is meant the collection 
of stimulus classes that define the error-success subsequences used in an 
analysis. With each class there is associated a single error-success 
sequence consisting of the chronological record of responses to members 
of the class. The subject-item analysis or paired-associate level (P- 
level) analysis consists of singleton stimulus classes, ioe«, one for 
each item. On the other hand, a concept-level analysis (Atkinson, Bower, 
and Or others, 19^5 ^ PP° 30-31) groups all stimuli in a list to define a 
single stimulus class giving rise to one error-success protocol for each 
subject. The units of a given level are the particular stimulus classes, 
e.go, for a P~level analysis, the units are the individual items. An- 
other level of analysis discussed in Chapter 2 is as follows. Suppose a 
list of J*M S-R pairs is composed of J classes of M S-R pairs, 
where the items in any class of M items are interrelated and paired 
with the same response. The rule level (R-level) of analysis is de- 
fined to be the analysis where each group of M stimuli forms a stimulus 
class which defines a single error-success sequence for the class. Thus 
each subject would donate J error-success sequences for an R-level 
analysis. The units for this analysis would be the J classes of stim- 
uli. Chapter 2 discusses methods of drawing inferences from a model (or 
psychological theory) by investigating alternative levels of analysis on 
the same set of data. 

Chapter 3 extends Chapter 2 in the following sensei’ while much of 
Chapter 2 concerns the one-element model. Chapter 3 presents a model 
which is analogous but which allows subjects to learn either a particular 
S-R pair or a collection of related S-R pairs on a particular trial. 
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This model is called the all-or-none multi-level model, since it assumes 



that learning can take place at two levels simultaneously. These two 
levels are the P-level, corresponding to a P-level data analysis, 
and the R-level, corresponding to an R-level data analysis. Alter- 
natives to the paired-associate anticipation procedure (i.e., the pro- 
cedure whereby random perm.utations of the entire list are presented 
sequentially) are introduced, and some of the implications of the all-or- 
none multi-level model for these experimental procedures are presented. 

The models discussed in Chapters 2 and 5 are not designed to represent 
a theory of paired-associate learning but to indicate how inferences can 
be made by considering a fixed model on various levels of data analysis 
and in various experimental settings. 

Chapter k establishes a mathematically rigorous basis for analyzing 
a large class of models which embody item dependencies. In this formula- 
tion, each item is allowed to have a different effect on the states of 
all the items in the list when it is presented for an anticipation trial; 
and, further, the state of each unpresented item in the list can effect 
the response probabilities and transition probabilities of the presented 
item. Among the motivations for developing this general mathematical 
framework are the following; 

(l) The analysis of the all-or-none multi-level model in Chapter 5 
is limited, owing to the difficulty in deriving properties of the 
model from the axiomatization presented in that chapter. The formu- 
lation of the all-or-none multi-level model in Chapter 5 is along 
the lines that models are conventionally axiomatized in the litera- 
ture (cf. Atkinson, Bower, and Cr others, 19 ^ 5 , P» ^5 and p. 555 ); 
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namely, a particular item is singled out and the various things 
which the model postulates can happen to that item are presented. 

At the outset of Chapter 4, the argument is made that when a model 
postulates item interactions, it might be more profitably analyzed 
in the context of a set of axioms that describe the things that 
can happen to the whole list of S-R pairs upon a presentation of 
a particular item. The chapter then develops this analysis and 
demonstrates that it helps overcome analytical dlff ic’ilties that 
were inherent in the single-item axiomatlzatiori, 

(2) An increasing number of mathematical models for list learning 
are embodying processes which involve item dependencies. Therefore, 
such models might profit from an analysis in terms of a framework 
designed to handle these dependencies. The argument for this case 
is presented in more detail in Chapter 4, pp, 49-53^ 

(3) Many experimenters have argued that most current list-learning 
experiments involve processes which concern interrelationships be- 
tween items during the course of learning. Investigators have 
discovered a variety of psychological processes which operate, in 
varying degree, in such experiments. Most notable are the following 
processes? (i) memory and its organization (cfo Peterson, 1965? 
Melton, 1963), (ii) coding processes (cfo Symposium on coding and 
conceptual processes in verbal learning, articles by Battig, Cohen, 
Gofer, Tulving, Kendler, Shepard, 1966), and (iii) in second-lan- 
guage learning, dependencies arising either because of transfer 
from English or because of linguistic dependencies that are built 
into the second language (Crothers and Suppes, in press). The for- 
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mulation in Chapter 4 is designed to handle models ■which postulate 
processes like these and others which are similar . 

( 4 ) Traditionally, mathematical learning models have not been 
stated on levels that are general enough to constitute theories of 
paired-associate learning. By this is meant that many of the learn- 
ing models are designed to predict performance only for a particular 
experimental procedure and level of data analysis. A model foimia- 
lized in the framework of Chapter 4 can, in principle, predict per- 
formance for any mode of S-R presentation chosen for experimenta- 
tion. The stochastic process which predicts performance for a 
particular presentation schedule comes as a logically tight deriva - 
tion from the theory and does not represent the theory itself. 
Examples of how a stochastic model is derived from a general learn- 
ing model axiomatized in the framework of Chapter 4 are presented 
in Chapter pp. 105 , 105 , 115 . 

( 5 ) Another contrihuter to the motivation for including Chapter 4 
is the bias that progress in mathematical learning theory need not 
always be made by proposing a new theory of verbal learning (this 
is not attempted in the paper) but by the bringing of formal tools 
to the task of constructing new methods for drawing inferences from 
data (for example, the correlational analyses developed in Chapter 2 , 
pp. 22 - 24 ) as well as constructing a formal framework for drawing 
conclusions from a theory once it is stated. 

For these reasons, it is felt that Chapter 4 represents a definite 
contribution to mathematical learning theory, over and above the more 
specific developments in the other chapters. Nonetheless, the contribu- 
tion does not represent a final solution to the problems we have raised. 

9 



Of course, these problems and observations which motivated Chapter 4 had 
been previously recognized by other investigators, and they are pooled 
to warn the reader of the particular bent that the paper (especially 
Chapter 4 ) will take » 

Chapter 5 illustrates how the framework of Chapter 4 can be applied 
to specific models o An analysis of the mixed model paralleling that of 
Atkinson and Estes (1965) is presented in terms of the framework. Re- 
sults for various presentation schedules are presented to illustrate the 
flexibility of the framework= Next the all-or-none multi-level model 
receives an additional analysis (to that given in Chapter 5) in terms of 
the frameworko The additional feature of this analysis is that the pro- 
cess of deriving Markov models for a particular choice of presentation 
schedule is illustrated (Chapter 5, PP^ 105^105)^ Finally, Restless 
strategy-selection theory is developed in terms of the framework, and 
several problems with its earlier axiomatizations are met squarely by 
this analysis. 

In Chapter 6, several experiments that the writer has conducted 
are briefly discuisedo In addition, possible directions for further 
experimentation and analysis of multi-level processes in list-learning 



are indicated. 



CBAPTEH 2 




DATA ANALYSIS ON VARIOUS LEVELS 



In this chapter an analysis of the prohlem of levels of learning is 
initiated in a somewhat restrictive situation.. Suppose one has a list 
of S-R pairs to he presented to subjects hy the anticipation procedure. 
Assume the list is structured so that groups of stimuli paired to the 
same response have inter-relationships, Sog., all stimuli paired to a 
certain response start with the letter A, or all stimuli in a certain 

response class are names of animals. 

For ease of presentation it will he assumed, for the moment, that 
learning is either on the single item level (P-level) , on the rule level 
(R-level), or on both. Let us illustrate this with the following list: 



Stimulus 

LEBESGUE 

RIEMANN 

STIELTJES 

FISHER 

BENKO 

RESHEVSKY 

STICKLES 

PARKS 

CASEY 



Response 

1 

1 

1 

2 

2 

2 

5 

5 

5 



Depending on Instructions and whether or not the subject is familiar 
with mathematicians responsible for a method of integration, contemporary 
American chess players, or offensive ends for the San Francisco Forty- 
Niners, respectively, the subject might learn single S-R pairs or 
groups of S-R pairs. The assumption made in this chapter is that the 
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unit of learning is the single item (P-level;, or the set of 5 items 
related by the rule (R-level; , 

The concern of this chapter is with the properties of performance 
measures, i.e., the predictions a learning model might make for various 
ways of viewing the performance data. Considering only errors and sue- 
cesses, the primary datum from a subject in an anticipation procedure 
experiment is a long string of stimulus -result (error or success' pairs.. 

The two modes of data analysis corresponding to the theoretical notions 
of P and R-level learning are as follows ^ For a P-level analysis 

we abstract and pool all subsequences from the primary datum correspon- 
ding to each stimulus in the list^ and 2. For an R-level analysis we 
abstract and pool all subsequences corresponding to a particular res- 
ponse < The example list has 9 P -level subsequences and 5 R-level sub- 
sequences for each subject- 

In the literature, models are usually developed with a particular 
level of data analysis in mind (e.g-. Bower, l96l, Restle, 1961,. Even 
so, a model can be viewed as a stochastic process which generates se- 
quences of Is and Os (errors and successes If one wishes to apply 
a model, viewed in tnis way, to his data, he must choose a level i or 

levels) on which to apply it (e.g , Suppes, Crothers, Weir, and Trager, 1962)-. 

2 / 

Any nontrivial learning model-' predicts that data will look dif- 
ferent when analyzed on different levels. As b. proof consider a primary 
datum as a string of Is and Os-, Depending on which subsequences are 

abstracted for analysis, different results on such statistics as, for 
~ — — • 

•= Of course, for a trivial model producing strings of all zeros, each 
subsequence would also consist of all zeros and provide a single counter 
example „ 



12 



example, the proportion of Is in the fifth place (i.e., Pr(error on 
’•trial" 5)) are likely^ 

It is a logical possibility that two learning models could agree 
in predictions on one level of analysis hut disagree on another. To see 
this possibility consider the primary datirni of strings of Is and Os, 
two models might agree on the probability distributions over subsequences 
but non-independence considerations might cause them to disagree on dis- 
tributions over the primary datirni levelo For example it is possible 
that a simple model could fit P-level data and yet fail to account for 
an R-level analysis of the same datao Finally, it is possible for a 
choice of models to be correct but a choice of level of analysis to be 
wrongo Such a possibility must have occurred to Suppes, et» al. (1962) 
who actually used the same model on several levels of data analysis. 

Comparison of One -element P-level and R-level Models 

In this section we shall investigate the implications of the one- 
element model holding on either the R-level or the P-level. To illus- 
trate some of the points above, both the R-level and P-level analysis 
of data generated by the P-level and R-level one-element model will be 
presented. In the next chapter a model allowing both types of learning 
will be presented. 

The one-element model to be used in this analysis takes the following 
form (Estes, 1959, Bower, 1961 ) o The unit to be learned starts in an 
unlearned state U. On the presentation of a unit in state U, the 
correct response is made with probability g and an error with proba- 
bility 1-g. After response, the unit shifts to a learned state L 
with probability c and remains in U with probability 1-Co Units 



in L are always responded to correctly^ and, once in L, a unit re- 
mains there. These assumptions are conveniently summarized hy the tran- 
sition matrix for the implied two-state Markov chain; 

state on trial n+1 Pr( correct j row state) 

L U 



state on 


L 


1 


0 




art 

1 


trial n 


U 


c 


1-cJ 




-S. 



If the unit is a single item, we shall refer to the model as the 
one -element P-level model. If the unit is a group of items paired with 
the same response, we shall refer to the model as the one-element R-level 
model. Logically there are four possihilities for jointly considering 
the level of data analysis and the type of one-element model. These are 
(P,P), (P,R), (R,P), and (R,R) , where the first letter refers to the 
level that data statistics are examined and the second identifies the 
model. 

The (P,P) and (R,R) analyses are analogous to the usual paired- 

associate analysis of the one-element model (Bower, 1961 ) and the concept 

level analysis of the all-or-none concept model’^ (Restle, I 961 ) o The 
reader wishing to review these analyses in greater detail is referred 
to Atkinson, Bower, and Crothers (1965> Chapters 2, 3 )" 

The (P^R) and (R,P) analyses are less usual and require some 

comment. A (P,R) analysis consists of plotting data statistics on the 

P-level when data has heen generated hy the one -element R-level model. 

In other words the model implies the unit is the collection of M items 
related hy a rulei the learning of this unit is governed by the R-level 

57 — 

— Restle ' s model has a learning only on errors assumption^ whereas, the 
one -element R-level model assumes learning is equally prohahle after a 
success or an error. 
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modelj however, for analysis, the unit is broken into P-level subse- 
quences - one for each itenio Rather than present a simulation of data 
generated and analyzed in this way, the derivation of P-level statistics 
for arbitrary parameter values of the one -element R-level model will be 
presentedo These derivations assume the anticipation procedure. 

The (R,P) analysis is analogous to the (P,R) analysis except 
that data are examined on the R-level and the model which generates the 
data is a P-level model. In other words, several units (items in this 
case) are combined into a single unit and studied. 

To undertake the comparison of these four possibilities a set of 
statistics was selected. These statistics were selected both because 
they are among those usually considered in applications of models to 
verbal learning data (cf . Bower, 1961) and because they reflect salient 
points to be made in the analysis. These statistics are the learning 
curve, probability of an error on trial n+1 given an error on trial 
n, probability of no more errors following an error on trial n, dis- 
tribution, mean, and variance of the total errors T, distribution and 
mean of the trial of the last error L, and the probability of an error 
on trial n prior to the last error. 

To avoid future confusion a word about the meaning of "trial" is in 
order. By a trial on a unit is meant any presentation of any member of 
that unit, and by the k"^^ trial on a unit is meant the k^^ occurrence of 
members of the unit. To illustrate, consider the list on p. H. The 
fifth trial of the P-level analysis would refer to an item’s performance 
on the fifth cycle through the list, i.e., performance somewhere in the 
trial block depending on when the item appears on its fifth cycle. 
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However the fifth R-level trial would occur somewhere midway in the 
second cycle of the list; ioeo, the fifth R-level trial would refer to 
performance on the trial number of the fifth presentation of members of 
a category o This event would be constrained by the anticipation proce- 
dure to take place midway in the second cycle through the listo 

Before presenting the results of the analysis, a further word about 
notation is needed. Define to be the error -success random variable 

as follows? 



X = 

n 




if error on n"" trial of unit 
if success on trial n , 



Let T be the total error random variable (T = k means k errors 
made on a particular unit) , and let L represent the trial number of 
the last error. Then the statistics chosen for the comparison are the 



following, for n > 1 and k > 0? 

1. Pr(x = 1), 
n 

2 o Pr(x = l| X = l) , 

' n+1 n 

5o = Pr(no more errors following an error on n; , 
4o Pr(T = k), E(T), Var(T), 



5. Pr(L = k), E(L), 

6. ?r(x = l| L > n) o 

n 

The interesting comparisons of the four situations involve fixing 
the level of data analysis and varying the model. This is what is usu- 
ally done in comparative studies of models (cf. Atkinson and Crothers, 
1964)0 In Table 2ol the (P,P) and (P,R) analyses are compared and 
in Table 2o2 the (R,P) and (R,R) analyses are considered. Appendix 
I illustrates typical derivations of equations presented in Tables 2,1 
and 2.2. In these tables we shall refer to the parameters of the usual 



one-element model by c' and g'. c and g will be the parameters of 
the model analyzed on the inappropriate level, ice., c' and g' for 

(P,P) and (R,R) and c,g for (P,R) and (R,P) o We shall assimie 

M items are paired to each response. Those readers not interested in 
pondering the tedious derivations in Appendix I may note that for M = 1, 
expressions derived for (P,R) and (R,P) should take the same form as 
those of (P,P) and (R,R) respectively. 

Certain similarities in expressions under (P,P) and (P,R) are 

evident from the table. Pr(x^ = l) , Pr(T = k) , and Pr(L = n) are 

geometric distributions for both (P,P) and (P,R) . Also Pr (x^_j_^=l | x^=l) , 

b , and Pr(x = 1 L > n) are constant over trials for both situations, 

n' n 

It is, however, immediately evident that a one -element model will 
not fit data statistics in (P,R) . There are a number of ways to demon- 
strate this and one will be presented. Suppose that the one-element 
P-level model does fit data statistics in (P,R) « Then, from Table 2.1, 
we have 



Pr^^'^^(x = i|l > n) = Pr^^'^^(x = i|l > n) 
n ' n ' ' 



which requires the functional identity 



l-g« = i-g 



or 



( 2 . 1 ) 



g* = g 0 



Wow equating expressions for Pr(x^ = l) yields the identity 




Me 



which, inserting (2.1), requires 



( 2 - 2 ) 
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Table 2.1 Comparison of (P,P) and (P,R) . 



Statistic 


(P,P) analysis 


(P,R) analysis 


lo Pr(x = 1) 
n 


1 — 1 
C 

O 

1 

1 — 1 

M 

1 

1 — 1 


[- -M 

Me l\ ) 


2. Pr(x^^^=llx^=l) 




(1-g) (1-c)^ 


5 - \ 


c 


l-(l-c; _ 


l-g'(l-c') 


l-g(l-c) 


4. i. Pr(T = 0) 


g'b' 


(l-g}'b 

Me 


ii. Pr(T = k) 
(k > 0) 


(l-g'b')(l-l:)')^’^' 


Me 


iii. E(T) 


1-g' 
c ' 


1-g 

Me 


iv. Var(T) 


E(T)[^~ - E(T)] 


E(T)[^ - E(T)] 


5, i. Pr(L = 0) 


g’b' 


(l-g)l3 

Me 


iio Pr(L = n) 
(n > 0) 




d-g;[i-(i-c;“] ^[(1.,,) 
Me ^ L V -1- ; 


iiio E(L) 


(l-g')t' 


(1-g) 


,2 
c ' 


Mc[i-g(i-c;“] 


6o Pr(x^=l|L > n) 


(1-g') 


(1-g) . 
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(2.2) is satisfied only if M = 1, and c' = c, but then the R-level 
model would reduce to the P-level model. Thus, unless M = 1, the 
(P,R) analysis can not be fit by the one-element P-level model, i.e., 
the two models are not equivalent on the P-level. 

Thus similarities in equation type exist between (P,P) and (P,R), 
however, the expressions are different functions of the parameters. 

After presenting the results of the (R,P) vs. (R,R) comparison, 
several other comparisons between (P,P) and (P,R) not depending on 

the choice of a particular model will be developed. Also in Chapter 5 

a model involving both levels of learning will be presented, and the 
relative contribution of each sort of learning will be assessed. 

In Table 2.2 the comparison between (R,R) and (R,P) is presented. 

The contrasts are more striking than for (P,P) vs. (P,R), so not all 

statistics will be presented in closed form. Again we are assuming a 
list of size M. Finally one further convention is needed. If N is 
an R-level trial we need the cycle number K(N) of an item appearing on 
that trial. Since the P-trial of an item is restricted to the R- 
trial interval ((K-1)M + 1, KM) we have 

(2.5) K(N) = max(k: M(k - l) < N} • 

In cases where it is obvious we will denote K(N) by K. Table 2.2 
now follows. 
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lo Pr 



Table 2.2 Comparison of (R,R) and (R,P) 
Statistic (R.R) analysis (R,P) analysis 

(l-g<) (1”C)^'^ (l-g)(l-c)^^^^“^ 



(x^ 1) 



2. (l-g')(l-C) i 



5. b 



N 



l-g'(l-cT 



i[(l-c) (1-g) ]+^(l-c)^l-g) 
if N Mod M = 0 

(l-c)^"^(l-g) 



if N Mod M ^ 0. 
c 

where b^=- j^ g(l ~ c 7° function in- 

creases with N. 



k. i) E(T) 
ii) Var(T) 
iii) Pr(T = 0) 



iv) Pr(T = k) 
for k > 0 



1-g 



M(l-g) 






g'C r gc 1 

[i-c»(i-g'Tr 4i-g(i-c) 

(l-g'b')(l-'b')^‘^b' Not obtainable in closed form by the 

writer. 



5 . i) Pr(L = 0) 



ii) Pr(L = N) 
for N > 0 



iii) E(L) 



g'C 

[i-o'(i-g'TT 






q-g') 

o'[i-g'q-o')l 



6. Pr(Xjj=llL > U) (1-g') 



r gc I 

[l-g(l“^l 



M 






* n, -X-M-1 



N < M 



,K-1/, „x..*.*MK-N ^^<-N-KM-l 



(l-c)^"-"(l-g)b*L: 



K~1 



¥ 



'where g^l-cj ^ 

bability an item has its last error 
on or before its k’^^ cycle. 



NOT DERIVED 



NOT DERIVED 
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There are several striking contrasts between (R,P) and (R,R). 
Pr(x^ = 1) for (R,P) is flat in periods of M R-trials, i,e., 

Pr(x^ = 1) is equal to (1-g) for the first M trials, (l-c)(l-g) 
for the second M trials, and (l-c) (l-g) for the third M trials, 
etc. In addition most other trial -dependent statistics take jumps on 
trials kM + 1 for k = 0,l,2,o.o o Finally several statistics are 
not constant with trials for (R,P) but are for (R,R) , e.g., 

Pr(Xj,^^ = = 1), 

The similarities between (R,p) end (R,R) are few. When they 
do exist, they derive from the fact that the learning of each item pre- 
cedes independently. This is most strikingly seen in Pr(T = 0), E(T), 
and Var(T) . 

In summary this section has illustrated that the choice of a level 
of data analysis can influence the appearance of data statistics in much 
the same way that a model, if valid, can influence these statistics. A 
second point is that a model not only generates predictions for statis- 
tics on the intended level of analysis, but it also generates predictions 
on any level. This fact suggests that analyses on several levels in an 
experiment might provide supporting evidence for the validity of a model. 
The next section presents some cross -level analyses not restricted by 
choice of model. 

Model -Free Analyses of P- and R -level Learning 

Next we discuss model-free methods for determining when some learn- 
ing takes place at a higher level (more R- like) or a lower level (more 
P-like) than the level of data analysis. What is meant by "model-free" 
needs some clarification, V7e view performance, not learning, Thus some 
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sort of theory (or model) must he assumed to infer learning from perfor- 
mance data. By model-free is meant that we are assuming only that a 
change in the degree of learning of a unit manifests itself in a corre- 
sponding change in probability of correct to all items in that unit (i.e., 
an operational definition of "learning on a level" is desired). 

In this section we parallel the structure of the preceding section. 
First methods for determining when learning takes place at a higher level 
than level of analysis will be discussed, and then indications of when 
learning takes place at a lower level than the level of analysis will be 
developed. For ease of presentation we will present these results in 
the context of the P- and R-levels of the preceding sectiono It should 
be clear how to generalize these results to the case where more levels 
exist. 

Now we consider methods of indicating when learning is at a higher 
level than analysis. Accordingly, consider the case where some learning 
takes place on the R-level„ For simplicity suppose M = 2, i.e., pairs 

of related items are assigned the same response. Imagine the two P-level 

r 

protocols for an item pair are lined up one above the other. Since, by 
assumption, a single learning event may have resulted in simultaneous 
learning of both items in the pair, the sequences should bear a relation- 
ship to each other. For example if the one -element R-level model held 
with g = 0, the pair of last error trials for the two protocols would 
differ by at most one trial. Thus, if and Sg are the two stimuli, 

their protocols might look like the followings 

1111111000000., o , 

Sg 1111111100000.,. . 




c 5 






Gt.::# 




In general nny tendency for R-level learning should produce "co- 
variation" in protocol pairs of related items. Thus if is a statis- 

tic for the i protocol P o should be non-zero. 

Z 

J 2 

To illustrate, let x and x be the error-success (l-O) ran- 

n n 

dom variables for and S^. Suppose the one -element F-level model. 

holds with c,go fr^e.. It is a simple' calculation to derive P« 3 _ 2 * 

XX 

n n 

Cov{x X } 
n n^ 



1 2 
X X 

n n 



X X 

n n 



(2.4) 



^ r 1 2. 

Covfx X } 
*■ n n^ 



E(xJ; x^) - E(x^)E(x^) 

Pr(x^ = 1, x^ = 1) - Pr(x^ 



1) Pr(x^ = 1). 



Taking the two possible orders of presentation of and on P- 

trial n into consideration we have 



Pr 



(x^ = 1, x^ = 1) = Pr(x^ = 1) . 



Since 

(2.5) 

and 

( 2 . 6 ) 
we have 
(2.7) 



= S^2 = ’ 

X"^ X 

n n 



E(x^) = E(x^) Pr(x^^ = 1) , 



1 2 
X X 

n n 






The function starts at a value 0 on trial 1 and Increases exponentially 
to an asymptote of ISzsJIiigl . 



Of course the sample variance of the atatisiie P i 2 

XX 

n n 

crease with n as fewer errors are made. Although not presented here, 



r 



1 



this sampling variance could be calculated from the model o Thus the 
properties, including power, of a test of zero P 2 established. 

X X 

n n 

In general, p ^ ^ should be fairly simple to compute for any R-level 

XX 

n n 

model (or even a model which allows both P- and R-level learning such as 
the one -element multi-level model presented in the next chapter) pro- 
vided the model is in any way tractable. 

Other statistics could have been chosen for a correlation analysis. 
Several experimenters have empi^i^ically correlated total errors in an 
effort to ascertain relationships among units in the a.earning phase 
(Suppes, 3t alo, 1962; Crothers and Suppes, in press) 0 For example in 
Chapter 5 of the Crothers and Suppes' book, subjects were required to 
make multiple -choice grammatical ending responses to Russian nouns 0 
Several grammatical classes seived as the "concepts' to be learned. 

Various theoretical schemes for predicting the course of learning were 
presented. They were assessed on their ability to account for the pat- 
tern of pair-wise part correlations of total errors to the various con- 
cept classes. 

This writer would suggest that matrices of part correlations of 
statistics such as total errors or trial of the last error could be used 
often as a device for checking whether some learning is taking place on 
a higher level than analysis, This procedure can be illustrated by an 
unpublished experiment by D Ro Rumelhart and the writer. Only the 
analysis relevant to the correlation method will be presented now. 

In this study college -age subjects learned to pair 2k highly struc- 
tured stimuli to 6 response classes by the anticipation procedure. The 
S-R pairs (which were consonant letters) had the following structures 





2k 



Stimulus 



1. 


ACE 


2. 


ACF 


5. 


ADE 


4. 


ADF 


5- 


BCE 


6. 


BCF 


7. 


BDE 


8. 


BDF 


9« 


IGK 


10. 


IGL 


11c 


JGK 


12. 


JGL 


15. 


IHK 


14. 


IHL 


15. 


JHK 


l6. 


JHL 


17. 


0QI4 


l8. 


ORM 


19. 


PQM 


20. 


PRM 


21 0 


OQJT 


o 

CVl 

CVl 


ORN 


CVl 


PQU 


24. 


PRN 



It should he noted that successive 



Response 

1 

1 

1 

1 

2 

2 

2 

2 

5 

5 

5 

5 

4 

4 

4 

4 

5 
5 
5 

5 

6 
6 
6 
6 

groups of four stimuli have a common 



letter and are paired to the same response. 



The learning dara appear very complicated and their analysis is 
only partially complete at this writing. It appears that learning has 
taken place on several levels in the experiment* This fact was tested 
hy correlating trials of the last error to items in each group of four. 
Without presenting the details of this analysis here, it demonstrated a 
highly significant tendency for items in a 4 -unit to have similar last 
error trials.. By subtracting each subject's mean trial of the last 
error from each of his 24 items, a control for individual differences 
was attempted, i=eo, the data for the analysis were of the form 

-.24 

-rlj 1 ^ 

1=1 

where is the last error trial for item i subject jo More will 

be said about this experiment in the next section of this chapter and in 
Chapter 6 . 

Thus far we have considered in some detail the implications of 
learning on a level higher than the level of analysis. The conclusion 
was to compute correlations of various statistics on the units of the 
level of analysis. Any significant non zero correlation could be inter- 
preted as a possible indication of higher level learning. 

Next we return to the question of the implications of learning at 
a lower level than data analysis. The answers here are quite simple. 
Consider the R-level analysis of P-level data. It is a property of P- 
level learning, regardless of the model, that every M trials there 
will be a jump in the learning curve, i.e., Pr(x^^ = l) will be flat 
in periods of M trials. This result comes directly from the antici- 
pation procedure and the assumption of P-level learning which implies 
that items are learned independently. 
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In addition a statistic such as total errors is easy to work with. 



Regardless of the model we have 

Var(Tj^) = M Var(Tp 

where T is the total error random variable for the R -level and 
R 

is the total error random variable for an item. This result comes from 
the independence assumption. 

To illustrate these methods consider the experiment by Rumelhart 
and the writer discussed on pp. 24-26. plotted in Fig. 2.1 

for the R-level analysis (M = 4). A definite tendency for = l) 

to drop within a cycle indicates some R-level learning. The sizable jumps 
in Pr(Xj^ = l) between cycle 2 and cycle 5 might indicate some P-level 




Fig. 2.1. R-level Learning Curve for the 
list depicted on p. 25 (M = 4). 

The R-level learning curve is also used to show some R-level and some 
P-level learning for other experiments in Chapter 6 (p. 154 j, Fig. 6.12) 
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In addition to the learning curve and Var(Tj^), ^ ^n " 

should jump on trials + 1 , k = 1 , 2 , ooc, and the stationarity curve 
should rise over trials. Of course this latter feature could he accounted 
for hy other P-level models such as the two-element model (Suppes and 
Ginsherg, I965) » 



Conclusion 

In this chapter we have discussed some of the implications of 
learning on various levels. Two methods of inferring level of learning 
have been developed, though not exhaustively., The first is to assume a 
model and then derive statistics for analyses on several levels » In- 
ferences can then he made on the basis of the fit of the model to the 
datao The second method involves considering the general properties of 
the assumption of learning at a certain levelo These properties, which 
depend on the mode of item presentation, suggest several statistical 
analyses, e.g», P 2* chapter will have served its purpose if it 

X X 

n n 

convinces the reader that valuable inferences can be made from analyses 



of data on several levels . 



CHAPTER 5 



THE ALL-OR-NOM! MOLTI -LEVEL MODEL 

The derivations and results of the previous chapter dealt mainly 
with the case where learning was assumed to take place on either the 
P-level or the R-level hut not both. In cases where there would he any 
question of which level learning takes place at the more likely possi- 
hility would seem to he some learning on both levels „ The question then 
arises as to whether extant verbal learning models, such as the one- 
element model, can naturally he generalized to allow for learning on 
several xevels simultaneously. In this chapter a simple generalization 
of the one -element model to allow for such simultaneous learning is 
developed. In the next chapter a framework is proposed for axiomatizing 
other multi-level models. 

The model to he developed in this chapter (the all-or-none multi- 
level model) is intended to he a simple and natural extensi'^n of the 
one-element P- and R-level modelSo It is not intended to represent a 
theoretical stand on the issue of how paired-associate learning takes 
place. So, rather than regarding this model as an addition to the 
crowded literature on paired-associate models, it should he regarded 
as an exercise in the synthesis of extant models. 

Axioms for the Model 

In the development to follow we will assume that subjects are 
learning a list with a structure similar to the list on po 11, Chapter 2. 
In general, we assume that the list consists of J groups of M stimuli, 
where the members of any group are mutually related and each paired with 
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the same response = By the stimuli in a group being related is meant 
that there is some common rule or common structure to all the stimuli 
connected to any particular response » Thus in the previously mentioned 
list the three rules are respectively? mathematicians are ones, chess 
players ar.e twos, and football players are threes o No particular pre- 
sentation schedule is assumed, but the model will be axiomatized under 
the assumption that on any presentation of a member of the list the sub- 
ject first gives a response and then receives a paired presentation of 
the stimulus and its correct response (i<>eo, any particular presentation 
is like a particular presentation for the anticipation procedure;.) 

We wish to generalize the one -element model to allow for the possi- 
bility of learning the rule on any presentation of a relevant S-R pair 
and, in addition, to allow learning of that particular S-R pair if the 
rule is not learned » Accordingly, we will define an unlearned state U, 
an instance (paired-associate) state, P, and a rule-learned state Ro 
We require that each of the M * items be in one and only one of these 
states on any trial « Transitions among these states are possible only 
when an item is presented, and the probabilities of these transitions 
do not depend on the past history of presentations and responses but only 
on the current state of the presented item. 

The major departure from usual models is the assumption that if any 
item makes a transition to the R-state all other items on that trial 
move to the R-state, Thus an item's state may change when it is not 
presented. Finally performance (probability of a correct response) is 
assumed to be at a level g in state U and at a level 1 in states 
P and R , 
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More formally, let N index presentations of items in a block of 
M (i.e., R -trials); axioms for the all-or-none multi-level model are as 
follows.” 

1. Each of the M items is represented as being in exactly one of 

three states on any trial The states are an unlearned state 

U, instance learned state P, and a rule learned state R. 

2 . All items start in state U, i.e . , all items are in state U on 
R -trial N = 1. 

5 . When an item is presented it can change its state , and the proba - 
bilities governing these changes depend only on the ciirrent state 
of the presented item and not on the states of any of the other 
M-1 items , the past states of any of the M items , or the trial 
number . 

The assumptions about transitions to new states for a presented 

ijh 

item are exhibited in the following stochastic matrix. The ij 

term in the matrix is the probability a presented item in state i 

will reside in state j on the next R-trial (i,j e (U,P,R}). 

State of item on 
R-trial after Presentation 

R P U 

10 0 " 

c 1-c 0 

r p 1-r-p 

22: the presented item makes a transition to state R, 

the other M-1 items immediately make a transition to state R so 
that on all R-trials after this event all M items are in state R. 
Other than this possibility of transition, items not presented 
remain in their current states. 



( 5 . 1 ) 



state of 
item on 
trial of 
presentation 



R 

P 

U 
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This axiom can be summarized by the following rule: 

a. If the presented item is in state U, the other M-1 items all 
stay in their current states with probability 1-r and all move to 
state R with probability r; 

and 

b. If the presented item is in state P, the other M-1 items all 
stay in their current states with probability 1-c and all move 
to state R with probability c. 

6. Let X,, be a random variable defined by 

i l if error on R -trial N 
0 if success on R-^rial N. 

(1-g; if presented item in U 
< 0 if presented item in P 

0 if presented item in R o 

Theorems and Derivations 

Some of the properties of this model will he presented in the theorems 
and derivations to follow. The first theorem shows that, under appro- 
priate : rstrictions on the parameters of the all-or-none multi-level 
model, the one-element P-level model and the one-element R-level model 

are obtained. 

Theorem 3 -I 

a. If r=o=0 and pe(0,l), the all-or-none multi-level model 
is equivalent to a one-element P-level model,, 
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Then 



Pr(x^ = 1) = 
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b. If p = 0, r = c, and r, c € (0,l) , the model is equivalent to 
a one -element R -level model. 

Proof 

a. If r = c = 0 and p e (0,l), items can change their state 
only when they are presented. Since all items start in state U, state 
R can never he obtained. Thus the restriction implies the all-or-none 
multi-level model can be summarized by the following stochastic matrix 
for each item. OC = 1,2, ... , Mj 









Pr( correct 1 row state) 


p 

“a 


1 


0 




■ 1 




u 

■"a 


P 


1 -p _ 




. s 


> 



where n^, indexes presentations of item (i.e., P-level trials on 

any item). This is the one-element P-level model. 

b. If p = 0, r = c, r, c € (0,l), all M items start in U and 
any presentation results in a transition of all the items to state R 
with probability r. If N indexes R-level trials, the following 
stochastic matrix for all M items can be derived: 

Pr( correct I row state) 

1 
g 



R, 



N 



U. 



Tvr 



1 

r 



0 

1 -r 



This is the matrix for the one-element R-level model || 

Some additional notation will facilitate the statement of the next 
theorem. Suppose the M items in a block are ordered: 8 ^, 82 ^ <> • - ,3^^. 
For each R -trial N define the state variable for the block, to 

be . 0 .,Tj^^j^), where T^^^^ is the state of item 
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on trial N for The preceding axioms for the all-or- 

none multi-level model could easily be written in terms of the random 
variable but this will not be done here (see p. 45, Chapter 4). 

The next property of the model to be developed has implications for 
experiments involving post -learning transfer. Suppose that following an 
initial learning phase subjects are asked to make "best guess" responses 
to new stimuli. Suppose further that the new stimuli are constructed 
similar to stimuli in one of the blocks of M items, l.e., the new 
stimuli share a relationship or a rule with the other M stimuli in the 
block. It is a consequence of the following theorem that the more initial 
training trials on the block of M related items, the higher is the pro- 
bability of the appropriate transfer response to these new items. 



Theorem 

If r,p,c g (0,1' and N indexes R-trials on a block of M 
stimuli, then 

lim = (R,R," •,R;) = 1 

N ^ 00 



Proof 

The theorem follows from the fact that state T- (R,R, ,R) is 
an absorbing state. Let Q = min(c,r}. By hypothesis 0 < 0. Since, 
on any trial N, the block of M items has either probability r or 
c of moving into state (R,R,."',R)> have 



Pr(?jj = fR,R,.o.,R)) > l-(l-0 



N-1 



hence 



1> lim Pr(?. = IR,R, ,R)) > ii«i 'l-U-©)^] 

^ N “>00 



1 



N 00 

The above inequality implies 



lim Pr(^ = (R,R,'|'',R)) “ i 
N -=^00 
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The next Theorem and Lemma imply that the order in which items are 
presented does not affect the probabilities of being in the various 
states. 

Theorem 

Suppose S. and S. are presented on R-trials N and N+1, 
i,j = 1,2,...,M. Then, for all possible states t^, of the set 
of M items, 

^^^^N+2 "" ®i,N' ®j,N+l^ 

= ^^(^n+ 2 "" ®j,N'®i,N+l^ * 

Proof 

The apparatus necessary for a completely rigorous proof of this 
Theorem will not be developed until the next chapter. What follows is 
an outline of the main ideas in the proof. If ^ = (R,R, ...,R), the 
result is immediate, so assume ^ (R,R,...,R). Either = (R,R,...,R) 

or it does not. If = (R,R, "then commutativity follows by 

noting, for all real numbers, a, b, 

a + (l-a)b = b + (l-b)a e 

Using this fact with a = r, b = c establishes the result for = 

(R,R, . . . ,r) . 

^ (R,R,...,R), then a presentation of S. can affect 

only the state of item S. (similarily for S.). Since these effects 

1 J 

are independent, the order of appearance of S^ and S^ does not 
matter 1 1 



55 




1 



The preceding theorem will receive more attention in the next 
chapter (po 47)o Next we state a lemma which provides a strong test 
for the all-or-none multi-level model. 

Lemma ^.1 

Suppose in the first N R-trials presentations are to 

he made, where i = l,2,o..,M and 

M 

k. = N . 

1 

Then the order in which these stimuli are presented does not affect 

the prohahility of heingin the various states on N + lo 

Proof 

The lemma follows hy repeated application of pairwise commutativity 
established in Theorem H 

The preceding theorem and lemma provide both a strong test for the 
all-or-none multi-level model as well as a considerable reduction in the 
complexity of derivations from the model under certain presentation 
schedules. These points will be brought out in more detail in Chapter 
5 (p, 100) where an additional analysis of the model (in terms of the 
framework to be developed in the next chapter) is presented, 




Derivations for the Anticipation Procedure 
The model can also be used to provide a synthesis for the results 
of the preceding chapter o Under the assumption that r = c (rule 
learning is eq.ui -probable from both the P and U states) the multi- 
level model reduces to a model that postulates two simultaneous all-or- 
none processes? one for P-level learning and one for R-level learning. 
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In the next few pages statistics for both the P-level and R-level will 
be presented under the assumption of an anticipation presentation sche- 
dule . 

It should also be clear that under suitable additional restriction 
of the parameters p and r, results relevant to the four possibilities 
in Tables 2,1 and 2.2 of Chapter 2 can be obtained. Table 5*1 indicates 
the parameter restrictions which yield the four possibilities analyzed 
in the previous chapter (based on Theorem 5*l)* 

Table 5»1 



Conditions under which the All-or-None Multi-level Model Reduces 
to the Four Analyses of the Preceding Chapter (Tables 2. 1,2. 2). 



Chapter 2 analysis 
(P-ANALYSIS, P-MODEL) 
(P-ANALYSIS, R-MODEL) 
(R-AHALYSIS, P-MODEL) 
(R-AMLYSIS, R-MODEL) 



•Rsatrictions on 
Multi-level parameters 

c = r = 0 

c = r, p =5 0 

c = r = 0 

c = r, p = 0 



Level of data analysis 
of Multi-level model 

P 

P 

R 

R . 



Thus, a statistic derived for the multi-level model should reduce to its 
corresponding expression in Table 2,1 or Table 2,2 of the preceding 
chapter if the indicated parameter restrictions are made. Exceptions 
are when r appears in the denominator of an expression, e.g., Pr(x^=l) 

for the P-level analysis (see Table 3 ^ 2 ) . 

Only Pr(x^ = l) and Pr(x^^^_ = l|xj^ - D have been presented for 

the R-level analysis. Some of the other results cannot be obtained by 
this writer in closed form, and others seem much too cumbersome and un- 
informative to present. The results of the P- and R-level analysis of 
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the restricted (r=c) multi-level model appear in Table 5.2 to follow o 
It should be reiterated that the anticipation procedure is assumed and 
that each group of related items has M members o For selected deriva- 
tions of these statistics, the reader is referred to Appendix lo Finally 
K(N) refers to the cycle number corresponding to R-trial N (Eq. 2.5). 

The all-or-none multi-level model is an intermediate model to the 
P- and R-level models in the sense that it postulates both P- and R-level 
learningo It is of some interest to compare the analyses of Table 5»2. 
with those analyses of the P and R models in the last chapter Tables 

2oland 2.2, pp. l8,20, respectively). 

The results in Table 5 “2 for the P -level analysis bear a resemblance 

to the results for the P-level analysis of the one -element R-level model 

(Table 2ol)o Pr(x^ = l) is a geometric function of n, and Pr(T = k) 

and Pr(L = n) are geometric distributions. Similarly ^ 

b and Pr(x = iIl > n) are constants. Even with these similarities 

n^ ■ n 

(which also hold for the usual one -element model) the multi-level model 
is an alternative to the P-level analysis of the one-element R-level 
model. This can be shown by comparing selected statistics in Table 5=>2 
with those of Table 2.1. 

Denote by R the one -element R-level model and by L the multi- 
level model, assume both models are analyzed on the P-level. Denote the 
parameters of R by c’, g’ and those for L by p, r, g. Assume the 
models are equivalent. Then, by equating Pr(.x^ = ijl > n,;, we have 
the functional identity 

(5.2) g' = g “ 
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TaMe 5»2o P- and R-level Analyses of the 
All-or-None Multi-level Model (r = c)a 



Statistic 



1. Pr(x^ = 1) 



3. t 



P-level Analysis 



2. Pr(x^^^=l 1x^=1) (l-g)(l-P-r)(l-r) 



M-1 



M-1 



l-(l-p-r)d-r) 
l-g(l-p-r)(l-r)“’^ 



(l-g)(l 



R-level Analysis 









^ [l-r-p)+(M-l)(l-r)(l-p)^^^h 
if N Mod M = 0 

(l-g)(X-r)(l.p)"<")-^ 

if N Mod M 0 



4 . i. Pr(T = 0) 



Mr [ l-g( 1-r-b ) ( 1-r ] 



ii. Pr(T = k) 
(k > 0) 






Mr[l-g(l-r)^] 



iii. E(T) 



(l-g)[l-(l-r)^] 
M 

Mr[l-g(l-r) ]p 



iv. Var(T) 

5. i. Pr(L = 0) 



ii. Pr(L = n) 
(n > 0) 



E(T)[^ - E(T)] 



(l-g)[l-(l-r)^] 

Mr [ l-g( 1-r-t ) (1-r ] 



1 - 



(l-g)[l-(l-r)^][l-(l-p-r)(l-r)^“^][(l-p-r)(l-r)^~^] 

Mr[l-g(l-p-r)(l-r)^"^] 



n-1 



iii. E(L) 



(l-g)[l-(l-r)^] 



Mt’[l-g(l-p-r)(l-r)^"^][l-(l-p-r)(l-r)^“^] 



6. Pr(x^=llL>n) (1-g) • 
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Comparing ^ ^ yields the identity 

(1-g') (1-c')^ = (1-g) (1-p-r) , 



which, inserting 5»2, reduces to 



(5.5) 



i ■^ . ^ M , T , T V M-1 

(1-C) = (l-p«r.) (1-r; 



Now comparing Pr(x^ = 1) for both models yields the identity 



Mn 



(l-g’)[l-(l-c’)^1 -.^M-in-l 

LU-c ; j 



(l-g) [l-(l-r)^] r/^ - M-lTn-l 

— [(l-P”rj(l-r; ] 



Substituting (5*2) and (5*5) yields 



(5oi^) 



l-(l-c') 
c ' 



M 



l-(l-r) 



M 



r 



0 



This last identity implies c' = r. 

Now 

c ' = r 



and 



U-C) 



M 



/n S/T \M-1 
(1-p-rj (I'-r) 



only if p = 0, but in this case model L becomes model R. There- 
fore we conclude that provided p ^ 0, the multi-level model is not 
equivalent to an R-level one -element model analyzed on the P-levelo 

Now we turn to a comparison of the R-level analysis of the multi- 
level model and the (R,P) analysis in Table 2o2o The two statistics 
presented in Table 5»2 for the multi-level model bear similarities to 
their counterparts of Table 2»2 in the preceding chapter. Pr (xpjj- - l) 
jumps on trials kM+1 for h=l,2,o.. for both models, and Pr(xpj_^^=llxjj^=l) 




is constant in successive blocks of M trials and jumps on trials kM+1. 
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The major difference between the two models is in = l) be- 

tween jump points (i.e., within a cycle) » Within a cycle the one-ele- 
ment model = l) is flat; whereas, for the m.ulti -level model, it 

is geometric in shape » To see this, = l) for the multi-level 

model is plotted, for g = l/5; ^ = Ool, p = 0.5; M = 5; in Fig. 5*1 
below. 
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Fig. 5 "I" R-level Learning Curve for 
All-or-None Multi-level Model (g = l/5; 
r = 0.1, p = 0.5; M = 5)* 



Next we discuss correlation of item protocols for the multi-level 
model. Just as for the one -element R-level model, one would expect any 
two item protocols for related items to "co-vary." In the preceding 
chapter we introduced P ]_ 2 as a trial dependent measure of 

X X 

n n 

this co-variation (where x is the error -success random variable for 

n 

item i) . Assume M=2, then P ]_ 2 multi-level model (r=c) is 



(3.5) P 



Pr(x^ = 1, = 1) - Pr(x^ = 1) Pr(x^ = l) 

n ' n n n 





= Pr(x^ = l|x^ = l) - Pr(x = l) 
n n n 



(izii(2rEl 






p ^ 2 is different for the one -element R-level model than for the multi - 

XX 

n n 

level models For the one -element R-level model, P n o starts at 0 



mum for some n > 0, and then decreases to an asymptotic value of Oo 
This latter fact is true because, for large n, joint errors are only 
made to pairs of items not in state R and not both in state P. Given 
one item is not in P, the probability the other one is in P increases 
with n, i»e., the P-level process prior to the trial of transition 
into state R precedes independently for the two items* 

An analysis of the general multi-level model with r^c will be 
postponed until Chapter 5 (P“ 98) » This is because analysis of the 
model is greatly facilitated by a reformulation in terms of a general 
framework for analyzing multi-level models* The direction of this re- 
formulation (alternative axiomatization) will be presented in the first 
section of the next chapter* 



for n=l and increases to an asymptote of 




as n increases. 



However P 2 multi-level model starts at 0, reaches a maxi 
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CHAPTER 4 



GENERAL PRAMEWORK FOR MULTI-LEVEL MODELS 

In the preceding chapter we have illustrated how a particular multi- 
level model might he developed^ There is a property of this multi- 
level model that differentiates it from most other learning models. 

This property is that, under certain conditions, the M-1 items not 
presented on a trial change their states; whereas, under other condi- 
tions, only the presented item changes its state. In the axiom set for 
that model (Chapter 5, p. 51 ), it was awkward to formulate these 
properties. Thus the statement of Theorems 5*2 and 5*5 >^as greatly 
facilitated hy the introduction of the random variable T^^ (Chapter 5^ P* 55 )> 
which keeps track of the states of all M items in a block. In addition 
some further analyses of the model (Chapter pp. 98-105) 
greatly simplified by formulating the all-or-none multi-level model in 
terms of T^j. 

The organization of this chapter will be as follows. First the 
direction of reformulating the all-or-none multi-level model in terms 
of will be indicated along with some of the advantages of this 

formulation over the formulation of the preceding chapter. This work 
will suggest a general framework within which many models that allow 
learning on several levels (or, eq,uivalently, that allow items in a list 
to mutually affect each other in the course of learning) can be axiomatizedo 

Before the framework is formalized, an indication of its intended 
scope will be presented. The scope of the framework will be presented 
by organizing the classes of models to which the framework can be 
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applied under three headings. These headings will refer to three types, 
of item dependencies (item interactions) permitted by the framework, 
and several examples of extant models embodying each type of dependency 
will be presented 0 

Finally the framework will be developed formally along with several 
theorems that can be applied to the analysis of any model axiomatized 
within the framework. The theorems fall into two classes. The first 
few theorems (Theorems 4.1, k.2, 4.5) concern how to compute state pro- 
babilities and response probabilities for a model as a function of 
properties of the model and the presentation schedule. The latter 
theorems (Theorem 4.4, 4.5) concern how a model can be simplified along 
the lines of the particular dependencies it postulates, i.e., the frame- 
work will require that a model be stated in some generality and these 
theorems will concern how to reduce the generality in individual cases. 

The next chapter will present applications of the theorems to the analysis 
of the mixed model (Atkinson and Estes, 1965), the all-or-none multi- 
level model of Chapter 5, and a version of Restle's strategy-selection 
theory (Restle, 1962^ 1964, Chapter 4). 

To recapitulate the organization of this chapter, we will first 
reformulate the all-or-none multi-level model. This reformulation will 
suggest a general framework within which several models can be analyzed. 
Before presenting the formal aspects of the framework, an indication of 
the types of models which can be axiomatized in terms of the framework 
will be presented. Finally the framework will be developed along formal 
lines, i.e., definitions and theorems. Now we turn to the reformulation 
of the multi-level model. 
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Reformulation of the All -or -None Multi-level Model 



Let us reconsider the all-or-none multi-level model# Suppose the 

— > 

M items in a block ^ ordered Let T be a possible 

M-tuple of states for the M items, e.g., T = (R,R, . . . ,R) . Define Sf 
to be the set of all possible states of the M items. The axioms on 
p .31 of the preceding chapter imply 

M 

(It.l) “Sf = X [(R,R,...,R)} , 

k=l 

where ^ (U,P} is the M-fold Ct^rtesian product of the set {U,P} . J 
k=l 
M 

has 2+1 members. 

It is a property of the axioms for the model that, if the current 
state of the M items, T^^, is knovm, and the presented item, 
is known, for N=l,2,..a, then the probabilities of being in the various 
2^+1 states in Sf on trial N+1 are determined and are independent 
of past presentations, past states of the M items, and the trial index 
N. Suppose that the 2^+1 members of ^ are ordered, ^ = (T^,^,»»»,T 
then it is convenient to summarize the preceding remark by noting that 
the model implies that each of the items, S^, Las an associated set 
of transition probabilities from Q* to *3, where, for i=l,2,...,M 
and for all T^, T^' e 3, 

Prf^' Is .1^ ) 

is determined (independent of N) . It Is desirable to represent these 
probabilities of transition from states in of to states in 3 by a 
stochastic matrix for each item S^, i=l,2,o..,M. Then, is a 

(2^+1) X (2^+1) matrix of the probabilities of transition from states 
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in ^ to states in ef given is presented^ Thus, if item is 

presented on some trial N, then the associated matrix determines 

the probabilities of being in the various states, T eS , given the 
current state of the set of M items. The matrices are analogous 

to the stochastic matrices used to represent IVIarkov learning models, 
e.g., the one-element P-level model has an associated stochastic matrix 

^n+1 ^n+1 

1 0 

c 1-c 




As will be seen in Theorems i|-.l, 4.2, 4.5^ these matrices, P^^, will 
be used to compute the probabilities of being in the various states 
given certain item presentation orders in much the same way as P is 
used to compute these probabilities for the one -element P-level modelc 
The major difference in the two cases will be that the P ^ matrices 
are used to compute the probabilities of being in various states of the 
entire list , v^ereas, P is used to compute the probability the pre - 
sented item is in various states. 

A reformulation of the all-or-none multi-level model can be accom- 
plished in terms of the state space and the M stochastic matrices, 

P . o,,P , defined in the preceding paragraph- An additional discussion 

1' ' 

of this reformulation is presented in Chapter 5, pp- 9^-lO^n When the 
reformulation is done, it is much easier to state properties of the 
model than for the more conventional axiomitizaticn of the preceding 
chapter. To •'Mlustrate, suppose M=2 , Then { (U,U) , (U,P) , (P,U) , 

(P,P) , (R,R) } . If the items are and Sg, we have 
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(P,U) 

(U,P) 

(U,U) 
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(r,r) (p,p) (P^U) (u,p) (u,u) 
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0 1 -r-p 



(P,P) (p,u) (U,P) (u,u) 

0 0 0 0 



1 -c 
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0 
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1 -r-p 0 

0 1 -c 
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1 -r -p 



Now, for example, to verify coirmiutivity for the model with M=2 (Theorem 
5 . 5 , p ,35 ), one merely needs to show that IP^" Pg “ *^ 2 ’ *^ 1 ’ result 

is as follows: 

( 4 . 4 ) 

Pi* ^2 = ^2* 'Pi = 
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The basic idea of verifying that the tP^ matrices commute provides 
the substance of Definition 4.?. Models which have this commuting pro- 
perty are much easier to work with than non-commuting models. 
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In addition to facilitating analysis of the model, the preceding 
formulation has another possible advantage. This advantage is that the 
model is stated in terms of the theoretical quantities ^ and the M 
matrices, which depend in no way on boundary conditions such as the pre- 
sentation schedule or the level of data analysis. In other words, the 
stochastic process used to account for data in a particular experiment 
is not the model itself but a derivation from the model coupled with the 
particular presentation schedule and the level of data analysis. A 
model stated in this way can receive support from two sources'; 1; its 
ability to make detailed predictions in a fixed situation (fixed sche- 
dule and level of analysis) , and 2) its ability to account for the 
data in a number of different experiments in which both presentation 
schedule and level of data analysis vary One illustration of the way 
boundary conditions are coupled with a model to derive a stochastic pro- 
cess for a fixed level of analysis is reported in Chapter 5 pp. 1C4-105^ 
Theorems 42 and 4 5 are used for the all-or-none multi-level model 
(M=2j 0 The anticipation presentation schedule is assumed and the level 
of data analysis is chosen as the error-success process on the first 
appearing item in a cycle, i-e^, regardless of which of the two items 
is presented first on a cycle, the result of that trial is entered in 
the error -success protocol. 

The preceding development is designed to preview the framework to 
be formalized in this chapter It turns out that the framework is 
applicable to the analysis of many extant models v/hich postulate item 
dependencies in the course of list learning. Before presenting the 
framework, the classes of models v;hich can be axiomitized in terms of 
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the framework will be organized around the types of item dependencies 
they postulate. This digression into other models has several motiva- 
tions. First it is designed to ?how that the framework to be developed 
has wide applicability to extant models . Second it is the writer's 
feeling that more and more of the recent mathematical learning models 
are embodying some item dependencies in their assumptions (e.g., memory 
models) . Thus it is becoming less and less often that models assume the 
learning of S-R pairs proceeds independently o It appears that one con- 
sequence of this tendency is that some methods of model analysis other 
than the traditional P-level analysis for the anticipation procedure 
are in order. With the knowledge that the case for this trend in mathe- 
matical learning theory can be made only by weight of evidence- we turn 
to this task. 

If learning a list is presumed to take place on the P-level (level 
of individual items), then it is convenient to view each separate sub- 
ject-item error —success (l-O) protocol as a sample path from some sto- 
chastic process whose sample space consists of all strings of Is and 
Os (cf. Atkinson, Bower, and Crothers, 1965> P° 82-85) o If, on the 
other hand, the assumption of subject -item independence seems unrealistic 
then this analysis is, at best, only approximately correct. A survey 
of some of the literature on mathematical learning models reveals that 
there are at least three distinct types of item dependencies postulated 
by models. This section presents a discussion of these three theoretical 
tjp)es of item dependencies, and then the framework, which ds designed 
to incorporate the possibility of all three, is formallEed 

The first type of item dependency postulated by seme models is *fhat 
response probabilities for a presented item may not depend solely on the 
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state of the presented hut also on the states of the other items 

in the list<. The mixed model of Atkinson and Estes (1965) provides an 
example o In this model transitions among states for the presented items 
are independent of the states of unpresented items, hut response proha- 
hility to items in the unlearned state is determined hy the states of 
all items in the listo The work of Friedman is related to the mixed 
model (Friedman and Gelfand, 196^^ Friedman et al = , 1966) In the 

Friedman, et al., paper, a three e Markov learning model on stim- 
ulus patterns is postulated, and a number of complex response rules in- 
volving stimulus components are developed. 

Ruskin (unpublished doctoral dissertation) has analyzed the learning 
of concept stimuli composed of three two-valued dimensions in terms of 
models which assume that learning proceeds independently for each item, 
hut that response probabilities to items in unlearned states depend on 
the states of all items in the list. He has had some success in account- 
ing for differential numbers of errors to each stimulus in such problems » 

The second type of item dependency postulated by models is that the 
state of an item can change on trials when it is not presented. The 
concept learning model of Restle ( 196I) fits into this category. Strictly 
speaking this hypothesis model has the property that the states of each 
item may or may not change when a new hypothesis is sampled. The usual 
all-or-none two-state model presented by Restle (1961) and also by 
Bower and Trabasso (196^) represents the process of concept learning in 
a much more simplified manner than their theory implies. They accomp- 
lish this by lumping the states of certain Markov chains implied by the 
theory. Even this simplified model has the property that items not pre- 
sented can shift to the learned state. 
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More recently Restle (1962, 1964) has proposed a strategy-selection 
theory for pairt;d-associate learning. The theory supposes that two simi- 
lar items requiring dissimilar responses may become confused. Confusion 
is represented in the theory by certain mnemonical devices or strategies 
which the subject might use to retrieve an S-R pair from memory, i.e., 
if two S-R pairs in a list were AB-1, AC-2, then the strategy A-1 would 
result in confusion between AB and AC. It is a consequence of Restle 's 
theory that an unpresented item, say AC in the above miniature list, 
can change its state when another item, AB, is presented. In Chapter 5, 
pp.lOS, Restle 's model will be analyzed in detail using the framework 
to be developed in this chapter. 

The all-or-none multi-level model presented in the previous chapter 
is another example of a model that allows states of items to change on 
trials when they are not presented. An additional analysis of this 
model in terms of the framework will be given in Chapter 5^ P* 98* 

A fourth example is the trial-dependent -forgetting model (T.D.F. 
model) of Atkinson and Crothers (1964) and Calfee and Atkinson (1965)* 

In this model an item in a short term memory state can be bumped into a 
forgotten state as a consequence of the presentation of another unlearned 
item. 

The third type of inter -item dependency postulated by models is 
that the state of a particular unpresented item can influence the tran- 
sition probabilities for the presented item as well as other unpresented 
items. One example of this dependency is the Buffer Models of Atkinson 
and Shiffrin (1965). In these models the probability that an item will 
enter the short term memory buffer depends on the number of other items 



51 



already in the buffer Similarly -whether or not an item in the buffer 
is dropped on a certain trial depends on how mny other items are in 
the buffer. In most applications, however, the buffer is assumed to 
be full, 

A second example is the two-person game situation discussed in 
Suppes and Atkinson (i960) Player A can be in response state A^ or 
Ag, and the transition probabilities depend on the response state of 
player B in the sense that the states of both players determine the pay- 

' 4 

t 

off probabilities, and the payoff determines, in turn, the transition 
probabilities, 

A third example comes from a slight generalization of the a 11 -or - 
none multi-level model presented in the last cnapter. Suppose the pro- 
bability of rule learning when all items are in U is r, but the pro- 
bability of rule learning when any item is in P is c ^ r. Then a 
presented item in state U would have rule learning parameter r or 
c depending on the states of other items in the list 

In each of the examples presented, the probability of a response 

A. to a presented item S, on trial n, Pr(A. |s. ), depends not 

j n j ^ n 

only on the number of previous presentations of the item but also in 
some way on the number and positioning of presentations of items other 
than Sy This seems to suggest that a useful testing groun.d for models 
embodying item dependencies is in experiments where the presentation 
orders are manipulated and predictions of the probabilities of various 
responses are made. It seems to this witer that the ability of a model 
to account for various patterns of response probability as a function of 
controlled presentation orders is every bit as strong a test of a model 
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a& its ability to account for siibject-item error -success protocols in 
an anticipation procedure experiment <> Of course the preceding remark 
assumes that the model makes differential predictions of response pro- 
bability as a function of presentation order. 

This presentation order approacn to testing models which imply item 
dependencies has already been used by many, e,g., the miniature RTT 
paired-associate experiments (Estes, Hopkins, Crothers, I960; Izawa, 1965; 
Young, unpublished doctoral dissertation) , the work on optimization 
(Suppes, 196k^, Crothers, 1965; Groen and Atkinson, in press); and work 
with memory models for paired-associate learning (Greeno, 1966; Atkinson 
and Shiffrin, 1965; Bjork, unpablished doctoral dissertation). 

The Framework 

A. History of Major Ideas 

In this section a framework providing a possible synthesis of 
models which permit any of these three types of dependencies is developed. 
A number of general theorems for predicting state probabilities and res- 
ponse probabilities as a function of presentation sequence will be pre- 
sentedo The main theoretical quantity in the framework will be the 
state of the entire list rather than the more usual state of an itemo 
The state of the list will be represented by a vector of states of the 
items in the list. Each item in the list will be characterized by a 
matrix of transition probabilities from states of the list to states of 
the list. A matrix associated with an item will be effective whenever 
that item is presented on a trial, i.e., to compute state probabilities 
on trial N+1 one applies the matrix operator associated with the item 
presented on trial N to the vector of probabilities of being in the 
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various states of the list on trial N- A trial is defined to he the 
presentation of an item for a response, followed immediately hy the 
item paired with its correct response, i e , a trial here is equivalent 
to a usual anticipation procedure trial 

One precursor to the idea of defining a state of a list in terms 
of the states of the items in the list is found in Fstes ( 1959 ) ■ In 
developing general properties of component and pattern models, Estes 
suggests that one could define a state for a one -element paired-asso- 
ciate model in terms of the numqer of unlearned items in the listo The 
derivation on p« 56 of his chapter assumes the anticipation procedure ,i 
He shovrs how one can derive the probability of a correct response at 
the beginning of cycle n from a matrix whose states are the number of 
unlearned items, i e , if the list has M stimuli, th© states are 
0,1, .on, M. Estes' idea for treating the state of the list for the one- 
element model is generalized in this paper to apply to any model in the 
framework (Theorem ^ 3 ) 

A second example of the idea sf combiiring states of various items 
into a single state is found in Atkinson and Estes '1965) In section 

of their chapter on stimulus sampling +heory, they develop the mixed 
model for a two item list The items, ab and a£, are assumed to be 
either in an unlearned state U or a learned state L They develop the 
theory for a foiur-state process with states (U,U), (U,L), (L,U), and 
(L,L), where the first position refers the state of item ab and 

the second to item ae r4ore will be said about tnis work in Qiapter 9, 

vvmkmm 



The idea that presentations of different items can he represented 
hy different sets of transition probabilities among states of the entire 
list of items has been, in part, adopted by Restle (1962, 1964). His 
strategy-selection theory of paired-associate learning assumes that 
items in a list can be confused in the course of learning. This confu- 
sion results in a discrimination problem V7hich is solved by discarding 
strategies that confuse items requiring dissimilar responses. Restle 
does not allow for different items to have different transition proba- 
bilities in his applications of strategy-selection theory (cf. Restle, 
1964, Sec. 5, ppo 152-144); however, he points out that his applications 
are at best an approximation (Restle, 1964, pp. 168-I7I). In the final 
pages of the chapter, Restle suggests the direction necessary to take 
in order to square the models he uses with his theory. It is these sug- 
gestions of his, rather than his original model, that resemble certain 
developmen'rs" in this chapter 0 A more detailed analysis of strategy- 
selection theory will be presented in Chapter 5 (PP*108) of this paper. 

B. Definition of Model in Framework 

In the development to follow each item in a list will be required 
to be in one of a number of finite states on any trial of the experiment. 
The generalization from usual formulations of models will be to allow 
for the possibility for some or all of the items which are not presented 
on a certain trial N to do any of the followings (l) affect response 
probabilities on trial Nj (2) change their ovm states of conditioning 
on trial N; and (3) to affect transition probabilities of other Items 
in the list on trial N. 



Suppose a list of M S-R pairs (items;, denoted by 



Sg, , 

and a set of Q responses, denoted by 



Me will adop^ the idea of a state as a primitive notion in the frame- 
work States U and L in the one-element model, tne number of patterns 
connected to response in a two response pattern models and U, P, 

and R in the one -element multi-level model are all states in the in- 
tended usage of "'state" in the framework In Definition k„l -,he notion 
of an item scat© space is presented. It should be noted that, since the 
item state space is an ordered set of states, it is possible for a par- 
ticular state to appear more than once (with a different subscript) in 



the item state space 
Definition 4.1 

By a state space, T^, of an item is meant a finite ordered 
set of states 



Examples of item state spaces are {U,L) for the one -element 
model, CC.}? for the H-element two-response pattern model, and 
[U,P,B] for the all-or'none multi- level model presented in the 
preceding chapter Next we ■ rmall^e in Definition 1- 2 the notion 
of the state space for a 11 s+ of items 



Definition 2 

By Q state space of a list of M items with item state 
space is meant the M-fold Cartesian product 



• • • 



= T^X 




f 



where "x" is the Cartesian product of sets. 



For the one -element P -level model with a list of M items 

(4.5) = (f = : t. € (U,L), i = 1, 2, ... , M} . 

Thus if the item state space T^ has L states, the state space for a 

M 

list with M items will have L memhers. 

We will next define a model for the learning of a list (Definition 
h*3) . This definition will require that the stochastic process governing 
state -to-state transitions among he Markov in a certain sense. The 
Markov restriction is not thought to he too severe because in many non- 
Markov models the state space could he expanded to make the model satisfy 
the Markov condition. Disregarding the restriction of a finite state 
space for the moment, the identifiable state theory developed in Greeno 



and Steiner p.31T) illustrates one way in which this expansion 

can he accomplished. 

Although the restriction to a finite -state model and the Markov 
condition rule out certain models, like the linear model which requires 
an infinite state space to satisfy the Markov condition, generalization 

4/ 

of the present approach to include these models should he possible.-' 

V7e next present Definition 4.5* 

Definition 4.3 

Suppose J = {S^,Sg, 0 . «,Sj^} is a list of M items with asso- 
ciated response set ^ of size Qo Then ^ = (*S1, is a 

^ For example response probability might he used to define a state, and 
operators for each item could he postulated. 
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model for the learning of list pS in case 



1, There is an item state space 



T-, = T 



2' 






such that 



S = X • • ■ X Tj 



is a state space for the list 



J 



2. 



^ is a set of M x square matrices, P^, Pg,' ■, 



such that, for all i, j = l;2,c. ,,L^ and a - l,2,..o,M, the 



ii^^ term of P , nam.ely P^f, is the probability of transi- 
^ 01 ^ 



tion from state of the list to state t^ on a trial when 



item S is presented. These transition probabilities depend 
0/ 



only on i, j, a, and not on the trial index or preceding 
states o± the list, l.e,, for all trials stimuli € J ; 



states of the list U . , V £ £} , and past histories of presen- 

1 J 



tations, responses, and states, h, we have 
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5e 



^ is a function which specifies, for each stimulus , 



response A Q. y and state of the 1“’ st t c 
J 



Pr(A^^j^|Si,i^,Ui^) 



independent of the trial number N f= 1, 2, 



, y c ( 0 y XJ y 



Naturally if is a model we have, for all a = 1,2, ,M, i ~ 1, 

M 



L 

E P, 
■ 



ij 



1 , 



[ E 



er|c 






to 









and, for all j = 1, and t € 



Q 



= 1 • 



Consider, as an example, the one -element P-level model for a list 
of M items. The state space for this model is defined in Eq. 4.5, and 
the model specifies that transitions are possiole only for the presented 
item. Denote hy t^ the component of t^, where ? e . Suppose 
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(4.6) 
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0 if 
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and 
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and, if ? and w do not agree in any coordinate p with p ^ o:, 

- V— V 

= 0 . 

The response rule for the one -element model is generally stated in 
terms of a correct response and an incorrect response. Let he the 

response associated with and A^ = CL - {A^} o Then, for all o: s l, 

2,...,M, ?€ and N = 1, 2,o.., 



(4.7) 

and, of course, 
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To further clarify these abstractions assume the one -element model 
list with M=2. The members of O’ are (U,U) , (U,L), (L,U), and 
(L,L), where the first member is the state of and the second is 

the state of S^. According to Definition k.3 ve have 

(U,L) (U,U) 

0 0 

0 0 

1-c 0 

0 1-c _ , 

(U,L) (U,U) 

0 0 

0 0 

1 0 

c 1-c 

One effect of the preceding definitions is to allow us to view a 
theory for list learning as a set of M matrices of transition proba- 
bilities among the states of The device of dealing with 

permits one to handle the possibility of simultaneous 
learning on various levels » To illustrate, suppose the all-or~none 
jj]^]_ti —level model is written for a list with M=2o Then the implied 
matrices DP., and are given byEqso 4»2 and 4.5» The response 

rule for the all-or-none multi-level model specifies that response pro- 
babilities are completely determined by knowing the state of the pre- 
sented Item. In general, Item dependencies Implied by multi-level learn 

ing are recorded by their effect on t e 
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C. Definitions and Theorems for Presentation Schedules 

Now that the notion of a model in the framework has been foimialized, 
we move to the task of stating theorems for computing state and response 
probabilities as a function of the presentation schedule. These theorems 
are motivated by the idea that a multi-level model can be tested by 
manipulating presentation seq.uences and predicting response probabilities 
as a function of this manipuls tion. Before stating the theorems of this 

section^ one more definition is needed. 

(ji ^0 formulation of the notion of a model in the preceding section 
did not include a specification of the probabilities of being in the 
various states of the list on trial one, i.e., Definition 4.3 did not 
include a start vector. In order to apply a model to a particular ex- 
periment a start vector must either be assumed by the model, or the 
probabilities of starting in the various states must be regarded as 
parameters of the model. The notion of a start vector is formalized in 

Definition 4.4. 

Definition 4.4 

By a start vector p^^ for a model ) is meant an 

L dimensional row vector of the probabilities of being in the L 
states in at the start (trial one) of an experiment. 

In general, denote by the row vector of probabilities of being 

in the various states on trial N of an experiment, p^ can be 

viewed as a random variable whose value depends on the start vector 
the matrices ... ) and the presentation schedule. 

Next we present Theorems 4.1, 4.2, and 4.5 which give general 
methods for computing ^ as well as response probabilities on trial. N 
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under some frequently employed presentation schedules c The first theo- 
rem shows how to compute ^ fixed presentation sequence of 

stimuli for the first N trials. Theorem 4.1 should he regarded as a 
fairly obvious extension of a standard theorem in Markov chain theory. 
The theorem from Markov chain theory asserts that if IP is the transi- 
tion matrix for a finite state Markov chain, the probability of being in 
state j N trials after being in state i is given by the ij^^ term 
of IP (cf. Kemeny, Mirkil, Snell, and Thompson, 1959^ P* 586). Theo- 
rem 4.1 is a special case of the analogous theorem for inhomogeneous 
finite state Markov chains (i.e., chains whose parameters are trial 
dependent ) . 



Theorem 4.1 

Suppose a list, J, of M items, a model 

with an associated start vector Also suppose the presentation 

sequence 8^ ^ 8^ ? • • • ^ 8^ , for S^ ^ Sj i = 1, 2, . . . , N, 
12 N 'i 

is ad.--''inistered for the first N trials . Then the row vector of 
probabilities of being in the various states of the list on trial 
N + 1 is given by 

N 



(4.10) 




Proof 

The proof proceeds by induction on N. Clearly for W = 1 



I>2 Pp 



Assume for N - 1 that 
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Then the k term of is given by 
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_ y ^ |pj^ 
^N+1 " M ^Oi. 
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Although the preceding theorem is not suited for hand computation 
of any but the most simple models with M and L small, it could pro- 
vide a useful tool in computer simulation of more complex multi-level 

models . 

Although this theorem and the ones to follow concern how to derive 
the probabilities of being in the various states given various presen- 
tation sequences, it is quite easy to use these results to get response 
probabilities. Suppose is the sequence of presentations for the 

first N-1 trials; then, for all S^ e ^ e ^ e S', and 

N = 1, 2, ... , 



-M 






(4.11) 
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= I 4 ^ * 

k=l 



The first term is given by ^ and the second is the k^^ component of p^^ 
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calculated by Theorem 4.1<. In later theorems will be computed 

under various presentation schedules. If J is a presentation schedule 
(see Definition 4.5)^ we have 



(4.12) Ej(Pr(A.^JS^^j,)) 



I 

k=l 












Next we define the notion of a presentation schedule generator 
(p.s.g.) and state a lemma from Theorem 4.1 for finding E(^) under an 
arbitrary p.s.g. 

Definition 4 o 3 

Suppose a list of M items , J . 1^7 a presentation schedule 
generator^ J, is meant a rule which specifies the following prob- 
abilities : 

1. For all presentations on the first trial, ^ Jt F.r( 

specified. 

2. Let X denote Jx ... x/ and let h^ denote the history of 

^ N- times 

the first N presentations and responses. Then, for all 
s^ e all histories h^^, and all e I 



is specified. 

Thus a p.s.g. is just a rule for determining the probability of any 
seq.uence of presentations through the first N trials (possibly contin- 
gent on the subject's responses) for N - 1, 2, ... . 



Lemma 4.1 

Suppose a p.s.g. J for a list of M items and a model ^ 
with associated start vector i^. Then the expected probabilities 
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of being in the various states of the list on trial N (expectation with 
respect to J ) are given by 

X Pa V • • • ' ) 






Where s^^ ^ is expressed as 



Sof , ... , S 

1 N-1 . 



Proof 

With Theorem 1 and the treatment of p^ as a random variable in 
mind, the lemma follows from the fact that if X and Y are discrete 
random variables 

E(X) = X E(x| Y=y)Pr(Y=y) , 

y 

where the sum is over y such that Pr(Y=y) > 0 jj 

Next we introduce the notion of a "Bernoulli presentation schedule. 

A theorem is then stated for computing E^(i^) for a Bernoulli p.s.g. B. 
It turns out that for computational purposes it is useful to test a 
multi-level model with a Bernoulli presentation schedule. If this sched- 
ule is used, an expected operator or average transition matrix can be 
used to get state probabilities (Theorem 4.2), and a theorem which per- 
mits lumping of the average matrix (Theorem 4.5) under a further restric- 
tion greatly reduces the number of states in this matrix. These two 
theorems are used together to derive stochastic matrices for the all-jr- 
none multi-level model and for Restle’s strategy selection theory (Chapter 
5, p. 103 and p. 115 , respectively). 
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Definition 4.6 



Suppose a list of M items / By a Bernoulli presentation 
schedule is meant a rule J which selects item e ^ to be pre- 
sented on Trial N with probability independent of N and 

the previous item presentations and responses, OC = 1, 2, . . . , M. 
That is, for all CX=1, 2, ;M, N=l, 2, ... , 

independent of N and the history h^_^o Of course 

M 

^ ^(X ~ ^ ' 

Of=l 



Theorem 4.2 

Suppose a list of M items, a model (S' y ) 
associated start vector p^ and a Bernoulli p.s.go B = 
{jt^, Jtg, ... , . Define 



A = 



M 

I 

k=l 






with 



to be a matrix of "average” transition probabilities effective on 
any trial. Then 



( 4 . 14 ) 






w 



Proof 



The proof proceeds by induction on N. For N - 2, 



Eb(P 2^ " ^l^l^l **■•••'** 



M 



M 



5i S " A “ 

k=l 



o 

L ERIC 



Assume 




Then for all strings Ct = a^, ,,, , of the first M integers 



This theorem could alternatively be proven as a consequence of the 
theorem which states that the expectation of a product of independent 
random variables is the product of their expectation. Then the work 
would be to show that the conditions of this theorem are satisfied for 



J’' he mathematical learning theory literature, the two most fre- 
quent experimental paradigms for list learning are the anticipation pro- 
cedure and the R-T procedure. The anticipation procedure presents no 
difficulty for the framework. Thus an anticipation procedure for a list 
of M items could be defined in terms of a p.s.g. which selects any of 
the Ml orders of the M items with probability 1 /m! at the start of 
each cycle. 

The R-T procedure, however, presents slightly more difficulty for 



we have 




= 



matrix random variables and a Bernoulli p.s.g. along with the model 



the framework. The problein is that a trial in the R-T procedure does 
not fit the definition on p. 54 of this chapter. Instead of the stimulus 
being presented for a response followed immediately by a presentation of 



the S-R pair, the R-T procedure groups the R-trials (presentations 
of S-R pairs), groups the T-trials (presentations of the S members 
for a response), and alternates "blocks of R and T tria_s. 

There is a fairly simple and natural way to extend the framework to 
handle the R-T proced’re as well as several other situations to he 
mentioned. Suppose an event, E^, is defined to he any occurrence in 
an experiment which a theory says may affect the state of the list. Thus 
far, theories have heen restricted to those which specify that the only 
events are presentations of stimuli for an anticipation trial, i.e., thus 
far, transitions among states of the list are permitted only upon the 
presentation of a stimulus. We could associate a transition matrix IP^ 
with each event E^. Then, if event E^ occurs at some time point N 
in the experiment, would he applied to p^^ to give i.e., 

(4.15) %+! = P/v • 

Now the R-T procedure can easily he accommodated within the frame- 
work. Associated with each type of R«trial, R^, is a matrix and 

with each type of T-trial, T^,, a matrix Thus during a T 

cycle the associated item matrices for T-trials are effective, and dur- 
ing R-trials the matrices for R-trials on items are effective. Viewed 
in this way, the issue of learning on test trials is whether or not the 
transition matrices for T-trials are diagonal (is on the diagonal and 
Os elsewhere ) . 

Other places where this generalized notion of event might prove use- 
ful are as follows. Peterson and Peterson (196 2 ) had subjects count 
baclcwards between an R-trial and a T-trial to study memory. If models 
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for the Petersons' experiment were written in the framework of this 
chapter, one could define a counting event, E^. Presumahly the associ- 
ated matrix IP would tend to shift an item into a forgotten state. 

V 

A second possible use of the generalization of event in the frame- 
work would be in the optimization work of Crothers (1965)* Crothers 
considered two types of trials (modes of presenting material to be learned) 
and the paper concerned finding a solution to the problem of the optimal 
scheduling of these trials under various constraints. To solve this 
problem, he associated a transition matrix with each type of event; a 
matrix was assumed to be effective on any trial when its associated event 
occurred. The members of the state space, however, were not the states 
of the list but were states of a particular item. 

It would lead us too far astray to develop additional properties of 
presentation schedules within the multi-level framework. The preceding 
comments should indicate the way to incorporate a p.s.g. into the frame- 
work. 

Before leaving the section on computing state and response probabil- 
ities for multi-level models, there is another property of some models 
that can simplify derivations. If the matrices in ^ commute, computa- 
tions from a model are simplified (Theorem 4 . 3 )* First, we formalize 
the notion of a commutative model in Definition 4.7 and then state Theo- 
rem 4 . 3 * 

Definition 4.7 

Suppose a list of M items and a model /^. 9^1 is said to be a 

commutative model in case, for all Q{, P = 1, 2, . . . , M, 



1 



Examples of commutative models are the one-element P-level model 
and the all-or-none multi-level model. The former is commutative as a 

i 

consequence of the property that each item in the list is learned inde- 

i 

I 

pendently. To see this effect on the matrices, consider the P-level 
model for M = 2. We may compute ^^2*^1^ where and 

are given by Eqs. (4.8) and (4.9) respectively. The result is 





= [P -P 



1 

c 

c . 

2 

c 



0 0 0 

1-c 0 0 

0 1-c 0 



c(l-c) c(l-c) (l-c)‘ 



Theorem 5*3 suffices to prove that the all-or-none multi-level model is 
commutative. The appropriate matrix multiplication for M = 2 is pre- 
sented in Eq. (4.4). 

Commutative models make a strong prediction that the order of pre- 
senting stimuli does not matter. This is shown in Theorem 4.3* 

Theorem 4.3 

Suppose a list of M items, a model ^ with associated start 
vector i^. Suppose ^ is a commutative model and that, for 
Ct - 1, 2, ... , M, ^ Sq, items are presented in the first N 



trials, i.e.. 



M 

r H . 

cx^l 



Then p^^.n independent of the order of presenting the items 
JM+-L 



Proof 



From Theorem 1 and repeated use of commutativity of the members of 'fp 



we have, for any presentation order, 

(4.l6) Pjj^^ = 



M 




i 



I 



a O 



II 

u 



c=s fv 

(I 

r 








o 





am 

ut# 
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D. Reduction of the Cardinality of Models 



Before turning to specific applications, one more aspect of the frame 
work needs to he developed. As the size of the list, M, and the size 
of the item state space, L, increase, the size of Sf, the state space 
for the list, increases as L . Even for a three-stage model for a twenty- 
item list ^ would have 3^*^ = fhQS j'jQh jhO'L possible states in its 
representation within the framework. Also, working with matrices of the 
order of 3*5 x 10^ by 3*5 x 10^ would tax the abilities of the strongest 
computer. 

There are, however, several ways to reduce the cardinality of objects 
in the framework. Three of these will be developed in the next few pages. 
By way of preview, the first will be to drop inaccessible states, the 
second will be to break down a list into sublists such that no item de- 
pendencies (or mutual interactions) exist between members of ■eparate 
sublists, and the third is to use the notion of lumping states in a Mar- 
kov chain (cf . Burke and Rosenblatt, 1958) to effect computational sim- 
plicities in determining 

The first of these has already been used in this chapter for the 
all-or-none multi-level model with M ^ g. States (H, P), (P, E), (R, U) 
and (y, E) were dropped in the matrices in Eq.s. (4.2) and (4.3)* This 
is because none of these states is obtainable from other states and fur- 
ther have zero probability in . v7ith this in mind we state the follow- 
ing definition. 

Definition 4.8 

Suppose a list of M items, and a model 

with associated start vector Define the class of null states. 
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to "be the largest subset of 3 such that: 
1. For all ^ ^ ^ 




2. For all t^ € ^ and CX _ 1, 2 , o.. , M, 

= 0 . 
q: 

For the all-or-none multi-level model, 

= ((R, P), (P, R), (R, U), (U, R)} . 

It should he fairly obvious that if the state space of the list is taken 
as 

S -7[ 

the preceding definitions and theorems are unaffected in content^ hence, 
from now on, when a model is considered, it can be 

assumed that has been dropped from 

Dropping null states would be very important in situations where 
learning takes place mostly at high levels, i.e», where the .model speci- 
fies that large collections of items change states at once or not at all. 
For example, consider the one-element R-level model for M = 20. ^ in 

the associated model wouad be of size 2^'°; however, 3 - ^ would have 
only two members: (U, U, ... , U) and (R, R, ... , R)* 

The second method of reducing the cardinality of ^ (or 
is illustrated by a typical P-level analysis of a paired-associate ex- 
periment. In effect, a list of size M is reduced to M lists of size 
one. This is possible since the transition probabilities and response 



o 
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probalDilities for a particular item are assumed not to depend on the 
states of other items in the list or even the numher and order of previous 
presentations of other items. Thus, items that depend in no way on each 
other in their course of learning can he analyzed as though they came 
from separate lists. This observation is formalized in Theorem 4.4. 

First, Definition 4.9 concerning the classes of item dependencies which 
a model md "^ht postulate is stated. 

In the definition to follow, the notion of the set of items depen- 
dent on an item is developed. This idea is then used to define "level of 
learning," and finally, a theorem about breaking a list into independent 
sublists is stated. 

Suppose S^ is a particular item in a list yS. By % is meant 
the set of items in yi dependent on item S^. An item S^ is said to 
be dependent on S^ in case any onn of three possibilities obtain: 
i. response probabilities to depend on the state of S^; ii. the 

state of Sp can change on trials when is presented; or iii. the 

transition probabilities for S^, when is not presented, depend on 

the state of S^. These notions are formalized in Definition 4.9* 

Definition 4 .9 



associated start vector p^. For each Of = 1, 2, ... , M, the sets 



Suppose a list of M items and a model { n/, fP, X) with 



12 5 

Dc^, and D^ are defined as follows: 




differing only in their position such that 



Pr(A^|Sp, u) ^ Pr(A.lSg, ^1 . 
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11 > 



d£ = ( s Sq ^ J and there exist u, v with ^ v ^ such 

a p p 



that 



— ^ — > 
,u V 



pS Oj 



iii. = {Sp: Sp ei and there are u , v e ^ differing only in 

their position, S, el with r ^ and t e such 



that 






'N+1 ' 7,N" N' 



N+1 



r,W 



Then the set of items dependent on an item is defined to he 

4 u Da • 

in the preceding definition is just the set of items whose re- 
sponse probabilities can be affected by the current state of item 
d£ is the set of items whose states can change when is presented:; 

and is the set of items whose transition probabilities can be affec- 
ted by the state of D^, then, is the set of items dependent on 

in any of these senses . 

In the next section. Definition ko9 will be used to define a depen- 
dency relation on J. This relation will be extended to an equivalence 
relation in order to define level of learning in terms of the partition 
of ^ induced by the extended dependency relatione 

Let us define a binary relation, D, on ^ in terms of the ® ° 

We say depends on S^, written S^DS^ in case e Dq, for 

Of, p = 1, 2, ..o , M. Anticipating the development to follow, it would be 

a desirable property if Of = i, o . , M] forms a partition of 

i.e., if the D^, are mutually exclusive and U D^ = Put another way. 



ft 

k 



er|c 




1 



! 



I! 

il 



1 ! 




4 



r 

1) 




o > 









1 

1 

J 
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it would te desirable if the relationship of item dependency was an 
equivalence relation, i.e., reflexive, symmetric, and transitive. In this 
case, as a later theorem will show, the list could conveniently he "broken 
into suhlists, and each subject learning the list would provide one set 
of data for each sublist. 

Examples are the one -element P-level model where 
the all-or-none multi-level model where the equivalence classes are the 
groups of M related items. 

It would be unduly restrictive to require to be a partition 

of J. Consider the mixed model (Atkinson and Estes, I 965 ) for the 
following list: 

stimulus response 
ab 1 

be 2 

cd 5 

For this case = [ab, be}, = [ab, be, cd] and = [be, cd}-- 

certainly not a partition of J. These results come from the fact that 
the conditioning axioms for the mixed model require, for all stimuli x, 

= [x] and = [x] ; however, is the set of all stimuli which 

share components with x. The dependency relation is reflexive and sym- 
metric for the mixed model, but not necessarily transitive. (Parenthet- 
ically, one could test transitivity for such a list by manipulating 
presentation sequences and observing whether preceding presentations of 
ab affect response probabilities to Although this experimental 

question is of interest to the author, it will not be pursued in this 
paper. ) 
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Since it is too restrictive to assume that the dependency relation 
D is an equivalence relation, we define an extension of D to an equiv- 
alence relation and base an operation of breaking a list into sublists 
based on this extended equivalence relation o 



Definition 4»10 

Suppose a list ^ of M items, a model ^ with associated 
start vector and the dependency relation D induced by 

Define D'^, the levels extension of D, to be the minimal equiv- 
alence relation containing D, loe., D* - D has the fewest members 

Strictly speaking, the preceding requires a result from set theory 
to be a proper definition. Clearly, J x J is an equivalence relation 
containing D. Hence, setting D* equal to the intersection of all such 
equivalence relations containing D, which is easily shown to yield an 
equivalence relation, suffices to establish the existence of D*. Since 
j is assumed finite, D* can be easily obtained by construction,, One 



simply adds to D 


all pairs from j! x J necessary to s 


latisfy symmetry 


transitivity, and 


the reflexive property® Denote by 


■ the partition 


of induced by 


"n* 

a-/ • 






To illustrate 


! the preceding 


process, consider the mixed model for 


the following two 


lists s 






LIST 1 


LIST 


2 


stimulus 


response 


stimulus 


response 


ab 


1 


ab 


1 


be 


2 


be 


2 


cd 


3 


cd 


3 


ef 




de 


4 






ef 


5 
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For List 1 we have 

D' = ((ab, ab), (be, be), (ed, ed), (ef, ef), (be, ab), (ab, be), 
(ed, be), (be, ed)} •,* 

D'*= (ab, be, ed] x (ab, be, ed] U {(ef, ef)) , 
in other words the pairs (ab, ed) and (ed, ab) have been added to D' to 
satisfy transitivity; and finally 

= {{ab, bo, cd] , {ef}} . 

However, for list two, 

D" = D’ U {(de, de), (ef, de), (ed, de), (de, ef), (de, cd)} ; 

D"* = {ab, bo, cd, de, ef} x. {ab, be, cd, de, ef} ; 



and 



= {{ab, be, cd, de, ef}} . 



In other words the addition of item ^ t.o List 1 is sufficient to tie 
all the stimuli in the list together in the sense of Definition ii-.lO. 

We are finally in the position to offer a possible definition of the 
notion of "level of learning" used extensively in Chapters 2 and 5* By 
a level of learning is meant a partition of i.e., a collection of 
subsets of J which are mutually disjoint and exhaustive. By the high- 
est level of learning for a list and model ^ is meant the finest par- 
tition (one with the most equivalence classes) of / for which items in 
different subsets are mutually independent, that is, if € A c/ and 
Sp e B c/, and if A 7 ^ B, then not and not It will turn 

out that is the appropriate partition. 



In the theorem to follow we will base a method of breaking a list 
into sublists corresponding to the equivalence classes of At the 
same time the list is broken into sublists the model can be broken 
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Into corresponding submodels. The procedure is analogous to breaking one 
long string of errors and successes into a group of short sequences, one 
for each item, as is done in P-level analyses. The important property, 
captured in the theorem, is that presentations of items outside of a 
fixed cell in act as "dead trials" relative to changes of state 

probabilities and response probabilities of members of the equivalence 

class . 

Unfortunately, the theorem, although fairly intuitive, becomes very 
cumbersome from a notational standpoint. Therefore, some of the notation 
ussd in 1^116 'thsorsm 'will bs dcfinsd as follows . 

Let be the set of all possible presentation sequences through trial 

N. Then 






N 



— j^yi • • • 

N-times 



For each e define to be the set of all sequences of presen- 
tations from for the first N trials o Then 




B . X 0 • • * 

^“N-times 



Each sequence ^ ^ can be decomposed into subsequences, such that a 
particular subsequence, s^, represents all presentations of members of 
a particular B^. Thus, for each e is a subsequence of 

length p, 0 < p < N, consisting only of the members of B^ listed in 
Sj^. Finally, for each ? e SC define t^i) to be the set of all 
^ e& which agree with t in coordinate positions corresponding to the 

i .e 0 , 



states of items in B 



t(i) = (u: u e ,7 and for all OC such that e B^) 

It should he clear that for each ^ e forms a parti- 

tion of qT. 

With these notational devices in mind, we are prepared to state the 
major theorem of this section. Theorem 4.4 asserts that the response 
probabilities to an item in a particular equivalence class, B^ eM, 
depend only on the order and number of preceding items in that equiv- 
alence class. 



Theorem 4.4 

Suppose .</ is a set of M it=:ms with model ^ X) 

and associated start vector Let = (B^, B^, ... , B^3 be 

0 

the levels partition of ^ induced by the equivalence relation B*. 
Then, for all B^ e N = 1, 2, . . . , e and responses 

Aj € a, 



for i = 1, 2, . . . , V and e B.^ 



Proof 

te^/ 

By assumption of S e B. , response probabilities to S depend only on 

the states of items in B . , hence we have 

i' 



Pr(A, 



■j,N+l' a,N+l' 




^ ^^^^j,N+ll®o:,N+l' 

tU) 






where the sum is over the equivalence classes t^i) induced by B^, 



r 



i.e., the subsets if corresponding to each sequence of states of items 
in The next step involves noting that for all e 

This is established by summing over members of the set tL,^n(i) as follows 



^(^K+2;(i)hTVT) = I 



N 






W W 



u 



I [?x jr 

uet(i) N 



(5) 



= I [Pi fj” Pk' ] 

u'e?(i) 



(U-) 



The last two steps follow, since the states of items in can change 

only on presentations of items in B^^, and, further, the transition 

probabilities for items in B^^ do not depend on the states of items not 

in B, . The reader not convinced of this should note that, for all 
i 

y — ^ , "-^0^ ! 

Sp i B^, and any e B^, and u, v e ^ with u ^ v 



IPp^ = 0 . 



Putting these results together yields 

The gist of this theorem is that presentation of items not in a 

cell B. e do not affect the states or response probabilities of items 
1 

in B^. Consequently, each cell of can be studied as an independent 
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sublist of The associated model ^ for the sublist corresponding 

to will have state space where 



- (t(i): t e <50 . 



M. 



<9V i 

If has members, then a/^ will have L members, 

the transition probabilities, 



Finally, 



I 



a 



are 



determined from ^ since 



r 



a 



is the same for all pairs u e5r(l), 7e5T(j). 

Thus far we have discussed how inaccessible states in oT (namely, 
the ^ states) can be eliminated. Also we have considered how, in cer- 
tain cases, a list of M items can be partitioned into shorter sublists 
with a consequent reduction in the number of states needed to characterize 
the learning of each sublist. A third possibility for reducing the size 
of Sf ( 03: jr - 7l or for B^ e o'*) is to reduce the number of states 

in the item state space T^. This operation would reduce L and hence 
L . In this section we consider briefly the notion of lunging (combining) 
certain of the states in . "Lumping" is a technical topic in the Markov 
chain literature (cf . Burke and Rosenblatt, 1959? Kemeny and Snell, I960, 
pp. 132-l40), and its use within the framework is a highly model- specific 
question. 

The basic idea of lumping (or combining) the states of a Markov 
chain is as follows. Suppose M is a Markov chain with state space 
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, ■) T Q+ V— fv V tVJ ^ psDT'bi'bion. of 

X = c. , x^i. Let Y - ^ V 

i.e., Y consists of pair-wise disjoint subsets of X whose union is X. 
If the state space Y forms a Markov chain, then Y is said to he a 
valid lumping of the states in X. A sufficient condition for the lushed 
process Y to represent the state space for a Markov chain is as follows 
(Burke and Rosenblatt, 1958 )j Y is a valid lumping of X in case, for 



each ^ ^ 



Pr(x-y.) = Pr(x'-y.) 



for all x,x' s y^, where PrCx-y^) is the probability of transition 
from state x e X to the class of states y_j = X. It should be noted 
that if this condition is satisfied Pr(Y^-Y^) deflned-otherwise 

it is not. 

Depending on the model, this condition can be used to Itm® the 
states in each P e 0* in such a way as to reduce the size of In 

most cases this method of reducing S depends on the particular model? 
however, there is one case for which a somewhat general condition per- 
mitting lumping of the states in £' can be established. In the case 
where all M items in a list are similar in a sense to be established 
in Definition 4.11, it is possible to lump the states of S if each item 
is equally likely to appear on any trial. This condition is established 
in Theorem 4.5- First ^e formalize the notion of a "symmetric model" 
Which plays a role in this theorem. 



Definition 4 oil 

Suppose a list -Y of M items and a model 
Ttl is said to be symmetric in case ^ is left unchanged by 
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permuting the order of stimuli listed in the state space 



A symmetric model has the property that all items are treated alike 
by the model. By this is meant that the same set of matrices, would 

he obtained for any ordering of the items in the definition of the state 
space of the list, i.e., there is a matrix in ^ associated with the 
first listed item in the state space, one associated with the second, ... \ 
and further, these matrices do not depend on which item is listed first, 
second, .... As an example, consider the one- element P-level model 
for a two-item list. Let S and S' denote the two items o Eqs. (i)-.8) 
and (^.9) give the two members of fj^ for the S'S order of listing the 
states of the items. It should be clear that if the items were listed 
as SS' in the state space, the same two matrices would be obtained, 
except Eq. (i)-.8) woilLd apply on S trials and Eq. (i)-.9) on S' trials 
instead of the reverse set-up for the S'S order of listing states of 
items in 

All of the models considered in this chapter are symmetric models 
in the sense of Definition 4.11. One type of consideration that would 
tend to make a model asymmetric would be items of unequal difficulty o To 
illustrate, consider the aforementioned one-element P-level model for 
the list J! = ls,8'}; however, suppose S is an easier item to learn 
than 8 ' . Accordingly, let c be greater than c ' . Suppose the state 
of the list is listed in the SS' order, then the resulting matrices are 
as follows: 

0 0 

0 0 

l*c 0 



(4.17) 



P 



1 

0 



0 

1 




c 

0 
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and 



(4.18) 



{P' 



10 0 0 

c' 1-c' 0 0 

0 0 10 
00c' 1-c' I 



However, if S' is listed "before 



S 




the resulting matrices are 



(It. 19) 



IP' 



10 0 0 

0 10 0 

c' 0 1-c' 0 

0 c' 0 1-c' 



and 



(4.20) 



P 



1 0 

c 1-c 
0 0 



0 0 
0 0 




Since these two sets of matrices (Eqs. (4.17) and (4.l8) vs. Eqs . (4.19) 
and (4o20)) are not the same, the model is not symmetric. One important 
thing to note about requiring a model to "be symmetric is that it places 
no restriction on what types of item dependencies are possi"ble, e.g., 
the all-or-none multi-level model is a symmetric model. 

If a model is symmetric and is tested "by a Bernoulli presentation 
schedule with = i/M for i = 1, 2, ... , M (see Definition 4.6), 

it is possible to lump the states of the average matrix (see Theorem 4.2), 



(4.21) 




1 

M 



M 

4^1 



i=l 



o 
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The lumping permilled hy "these condi"bions produces a par"bi"bion of 
defined as follows . Suppose a list, J, of M items with item state 
space V ‘ counting partition of Sf, 

is defined as follows: 



(4.22) S = it = (e ,6g, , 6j^): is a number between 0 and M 

represen"bing "the numher of i"bems in s"ba"be t_. for 

L 

i = 1,2, ... , L| and £ e = M) . 

i=l 

Actually S itself is not a partition of S y "but corresponding to each 
e^ € ^ there is a subset e of ^ whose vectors have e^ items in 
state (i = 1,2, ... , L). It is these subsets, e, which form a 
partition of £ and correspond to the states of the lumped process pre- 
sented in Theorem 4.5* 

Theorem 4.^ 

Suppose (<y, is a symmetric model for a list of M 

items. Suppose that items are presented with a Bernoulli presenta- 
tion schedule with r. ^ = i/M for i = 1,2, ... , M. Then the 
Markov chain with state space S and stochastic matrix given by 
Eq. (4.2l) lumps to a Markov chain with state space 



Proof 

The proof proceeds by using the Burke -Rosenblatt criterion discussed 

on p. 82 of this chapter. Let e and e' be any two sets in Then, 

for all t^. ,?. e e, we have 
^ J 

(4.25) Pr(?^-e') =Pr(?j-e') . 
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'That is 



Eq.. ( 4«25) comes from the fact that is symmetric* 

M _> 

Pr(?.-e') = X 
^ k=l 




where ?* is the vector corresponding to ?. when the order of items 
in the state space ^ is permuted to list items in state first, in 

second, .0. , and items in last* Since the model is symmetric, 

the resulting set of matrices f* Is the same as f and hence the above 
equation holds. A similar argument implies 

Pr(t-e') = M r ' 

^ k=l 

and hence Eq. follows * This result establishes that the Burke- 

Rosenblatt criterion holdsj and hence, the lumping of S to ^ is 
valid. II 

Next, we indicate how this theorem, along with Theorem 4.2, might 
be used in the analysis of a particular symmetric model. Theorem 4*5 
and Theorem 4.2 are used in several places in Chapter 5 "to derive parti- 
cular models from a general model in the sense of Definition 4.5 (see 
pp. 112- 117 for Restle's strategy-selection theory). Consider the one- 
element P-level model for M = 2= The two matrices, (P^ and are 

given by Eqs. (4.8) and (4^9). Suppose a Bernoulli presentation schedule 
with It =it =2 (see Definition 4.6), then A_i, for Theorem 4.2, is 

as follows; 
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(L,L) 


(L,U) 


(U,L) 


(u,u) 




(L,L) 


1 


0 


0 


0 




(L,U) 


ic 


l-|e 


0 


0 




(4.24) Ai = 








2 (U,L) 


■Jc 


0 


l--|c 


0 




(U,U) 


0 


4c 


^c 

2 C 


1-c 


• 


Since the model is symmetric 


(see p. 


82), we 


can use ' 


Theorem 4 . 


5 to 


lump the state space ^ = {(L,L), (L,U), (U,L) 


, (U,U)} 


, The counting 


partition, S, is given by 












(4.25) ^={{(L,L)} 


, t(L,U) 


, (U,L)), 


l(u,u))) 




(denote these three sets by 


^2’ 


and e^. 


respectively) . 


Using 


Theorem 4 . 5 , we obtain the following stochastic matrix for the 


lumped 


process with state space 












®2 


1 


0 


0 






(4.26) 1^ = e 

2 


■^c 

2 ^ 


1 -lc 


0 








0 


c 


1 -c 


0 





This derived matrix can be used as a Markov model to describe the 
error-success process on the pair of items (Sj^,S2] under a Bernoulli 
presentation schedule with ^ model for error-success 

sequences is conventionally displayed as a transition matrix among theo- 
retical states along with a column matrix of the probability of a correct 
response given a particular state (cf. Atkinson, Bower, and, Crothers, 
1965, pp. 89, 255, and 505). Accordingly, the model for error-success 



o 



8? 



¥ 

sequences on derived from the one-element P-level model is 

displayed in Eq= (4.27) as follows? 



®2,n+l ®l,n+l ®0,n+l 



Pr( correct I row state) 



(4.27) 



'2,n 



'l,n 

'0,n 



^c 

2C 



0 



0 



1-^c 



0 

0 

1-c 



_ 


1 




i(l+g) 




g 







where g is the prohahility of a correct response for a presented item 
in state U. It is of some interest to note that the model in Eq. (4.27) 
is formally identical to the two-element pattern model axiomatized hy 
Suppes and Atkinson (196O, pp= l4-17). The equivalence comes from inter- 
preting the stimuli and as the two patterns. The Bernoulli 

schedule employed guarantees that each stimulus (pattern) is sampled on 
each trial with prohahility l/2c It is interesting to note that Suppes, 
et al. (1962) lumped the four-state matrix for the two-element model to 
one equivalent to Eq. (4o26)= The preceding observations suggest that, 



if the particular stimulus giving rise to an error or success on trial n 
is suppressed in the level of analysis corresponding to the analysis of 
the error-success process on M items for a Bernoulli presentation sched- 
ule, then the resulting model hears a resemblance to an M-element pat- 
tern model. However, in the case of more complex multi-level models 
than the one-element P-level model, it is possible for patterns 
(stimuli) to change their states when not "sampled" (presented). 

One final comment about the model represented by Eq. (4.27) is 
needed. Transitions from state e^ to e^ have different probabilities 
following an error tha.n following a success . This is because an error in 
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implies the unlearned item has been sampled; whereas, a success in 
e^ does not determine the state of the presented stimulus. This feature 
is shared hy the two-element pattern model of Suppes, ^ (1962). 

Although analyses of the two-element model can he found in the literature 
(cf. Bower and Theios, 1965); in general, there may he more than two items 
in the list. When M > 2, analysis of the resulting model (obtained hy 
Theorems 4.2 and 4.5) is best done hy computer. 

Bernhach (1966) has proposed a computerizahle scheme for analyzing 
Markov models. To use Bernhach’ s scheme, it is necessary to expand each 
state into an error and a success state. When this expansion is accom- 
plished, the differential probability of learning after an error or suc- 
cess is embodied in the matrix. To illustrate. Eg,. (4.27) can he so 
expanded; the result is as follows: 







®2 


S 

®1 


E 

®1 


s 




®2 


1 


0 


0 


c 




S 

®1 


A 


(l-A)g' 


(l-A)(l-g') 


0 


(4.28) 


E 

®1 


c 


(l-c)g' 


(l-c)(l-g' ) 


0 




S 


0 


eg' 


c(l-g' ) 


(l-c)g 




E 


0 


eg' 


c(l-g' ) 


(l-c)g 


where 


g' = 


' ■|'(l+g) and 






(4.29) 











Oq Pr( correct I row state) 



(l-c)(l-g) 

(l-c)(l-g) 



L°J 



i 

n 



0 ) 
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Bernbach's scheme could be directly applied to Eq. (4.28) to generate the 



statistics for the error-success on the item pair 

In the next chapter we present some detailed analyses of several 
models, using the theorems and methods developed for the framework pre- 
sented in this chapter. It should be emphasized that the tractability 
of a model within the framework depends on the construction of clever 



experiments designed to reduce the state space 
matrices in to manageable proportions. 






and consequently the 



o 

ERIC 



CHAPTER 5 



APPLICATIONS OF THE FRAMEWORK 

In this chapter several applications of the preceding framework will 
he presented. It is hoped that these examples will illustrate the flex- 
ibility of the approach to the problem of levels of learning presented in 
Chapter 4 . The framework is applied to the mixed model, the all-or-none 
multi-level model, and Restle's strategy-selection theory. 

The Mixed Model 

Atkinson and Estes (1963) develop the mixed model for the learning 
of the following miniature list; 

Stimulus Response 

ab 1 

be 2 

The assumptions are that each pattern is in a state U or state L, 
where L is an absorbing state, and items are presumed to start in U. 
Responses to an item in U are governed by the stimulus components in 
the sense that if a pattern is in L, then its components are assumed to 
be connected to the response associated with that pattern. Thus, if ab 
is in U and is in L, the probability of response 1 to a£ is 

1/2 X 1/2 + 1/2 X 0 = 1/4, where with probability l /2 the S uses 

component a, which is unconnected to either response 1 or 2 , and with 
probability l /2 he uses c, which is, by assun^tion, connected to re- 
sponse 2 . The one-element P-level model is assumed to govern the learn- 
ing of patterns, hence dependencies among items are produced only by the 
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response rules ■which are based on the conditioning of common components. 



The authors assume a Bernoulli presentation schedule (see p. 66 , Chap- 
ter with n =1/2. They derive a sort of "average" matrix of transi- 
tion probabilities among the states of the list, (U, U), (U, L), (L, U), 
(L, L). It is as follows: 

(L,L) (L,U) (U,L) (U,U) 





(L,L) 


1 


0 0 


0 






( 5 - 1 ) 


(L,U) 




1-^c 0 


0 








(U,L) 


ic 


0 14c 


: 0 








(U,U) 


0 


|-c ^c 


1 - 


c 


• 


This matrix 


is raised to 


the nth 


power to get state probabilities, where 


n indexes presentations 


of either stimulus. 








For the 


record, the 


response probabilities 


given the 


item presented 


and the state of the list are as 


follows : 










Pr(Aj_) 


Stimulus presented 


State of 


li 


St 




1 




ab 


LL 








1 




ab 


LU 








1 /k 




ab 


UL 






( 5 - 2 ) 


1/2 




ab 


UU 








0 




be 


LL 








3/^ 




be 


LU 








0 




be 


UL 








1/2 




be 


UU 
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Now let us. analyze this example \cithin the framework of the preced- 
ing chapter^. The list, consists of two members, ^ and 

The response set ^ consists of 1 and 2. The item state space Tj = 

{U, L) . q7= T X T. ^ consists of the following two matrices, where 



^ and ^ = Sg, 



(5.3) 



p. 



(L,L) 

(L,U) 

(U,L) 

(U,U) 



(L,L) 

. 1 . 

0 
c 
0 



(L,U) 

0 

1 

0 



(U,L) 

0 

0 

1 - c 
0 



(U,U) 

0 

0 

0 

1 - c 



and 






(L,L) 

(L,U) 

(U,L) 

(U,'J) 



(L,L) 

1 

c 

0 

0 



(L,U) 

0 

1 - c 
0 
0 



(U,L) 

0 

0 

1 



(u,u) 

0 

0 

0 

1 - c 



The response rule is presented in Eq. (5*2). p^ = (0,0, 0,l), and 
= {{ab, be}} (see p. 76 , Chapter 4). 

The model 7n = (5^, P,<t) is a commutative model since 



(5.5) 



iPl-P^ = ^2*^1 = 



0 

1 - c 
0 



0 



0 

0 

0 



_ c^ c(l-c) c(l-c) (l-c)^__ 

^ Portions of this analysis are reported in Batchelder, Bjork, and 
Yellott (1966, Ch. 8, problem 8.G.2). 
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Next we apply the theorems of Chapter 4 to analyze the mixed model in 
terms of the framework. First, suppose k^ and k^ presentations 
in any order for the first k^ + k^ = N trials. Then, according to 
Theorem 4.3 for commutative models, we have 



^N+1 



= P 



k. 



1^1 




= ( 0 , 0 , 0 , 1 ) 



k. 



l-(l-c) ^ 

0 



1 0 

^1 

0 (1-c) ^ 

^1 

l-(l-c) ^ 



0 

0 

0 




1 0 

kp k 

l-(l-c) ^ (1-c) ^ 



0 

0 



0 

0 



0 

0 

1 



0 

0 

0 






K 



l-(l-c) ^ (1-c) 



/ k, k,- k kp 

= Kl-(l-c)] ^[l-(l-c) ^], [l-(l-c) -^](l-c) , 



k, 



k. 



[l-(l-c) ^](l-c) ^ (1-c) 






Response probabilities can be easily determined using Eq. (4.n) and the 
response rules of Eq, (5.2). Constructing an experiment by varying the 
presentation order of the and S,^ stimuli would provide a strong 
test for the mixed model - 

Next we consider a Bernoulli presentation schedule with tc s Pr(S^). 
To use Theorem 4.2, we first compute . The result is 
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(5.7) 




1 0 0 
(l-«)c l-(l-«)c 0 

TIC 0 .l-3tc 

0 itc (l-Jt)c 



0 

0 

0 

(1-c) . 



Now we use Theorem 4.2 to determine 


E (Kt.t)* 


result is 






\ aN 

= PA 








1 


0 


0 


0 




l-[l-(l-« )c]^ 


[l-(l-"7X.)c]^ 


0 


0 




l-(l-«c)^ 


0 


(1-jtc) 


0 


(5.8) = (0,0,0,1)- 


l+(l-c)^ 


[l-(l-jt)c]^ 


(l-«c) 


^■1 

(1-c) 




-[l-(l-n)cf 


-(1-c) 


-(1-c)® 






rn •• 

-[1-Jt.cJ 






- 


= (l+(l-c)®- 


. [l-(,l..:t)cf-[l. 


[l-(l-n)c 


f- (1-c)®, 





(l-nc)^- (l-c)^% 



N 



-c) ) 



Response probabilities are easily determined using Eg,. (5*2). 

The tie-in to Atkinson and Estes' analysis comes from noting that 
for Ti ~ l/2 Eq. (5.8) reduces to the matrix in Eq. (5.1) 5 1.©.^ At 
is Eq. (5.1). Theorem 4.2 provides a justification for considering powers 
of this matrix to get state probabilities under the n s l/2 Bernoulli 
presentation schedule. Atkinson and Estes' choice of a single matrix 
determines the unit of analysis for the miniature list to be the error- 
success process on the pair. From this they are able to show that per- 
formance prior to the last error on the pair falls in the interval 
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(l/ 2 , 5/8)0 This is "because the stimulus not responsi"ble for the last 
error can either "be learned or not prior to the last error on its part- 
ner. Then the response rules specify the end points of the a"bove inter- 
val . 

The method of analysis proposed in this paper has the advantage 
that the items ^ and ^ can "be analyzed separately o One consequence 
is that the probability of an error response prior to the last error on 
a particular item will be an increasing function of the trial index| 
i,e.. if n indexes the presentations of Pr(x^ = ijL > n) is an 

increasing function from a value of 1/2 to 5 /^» This is because the 
mixed model assumes that learning the patterns takes place independently 
so, as n increases, the probability that ^ is learned increases with 
consequent negative transfer to ab. This result comes from the analysis 
in this paper by noting that, under the Bernoulli presentation schedule 

with jt = Fr(ab), 



Pr(?j,=(U,L)) 



(5.9) 



_ (1-Jtc)^~^- (l-c) 
(1-Jtc) 



N 

N -1 



where the appropriate probabilities from p^A in Eq. ( 5 »^) 
serted into Eq. (5.9). Eq. ( 5 = 9 ) tends to 1 as N increases. Since 

L > N (last error on ab > N) implies 
ab — 

t^jj € [(U,U), (U,L)} , 



the assertion of the preceding paragraph follows 
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Finally, the generalization from n = l /2 to n € ( 0 ,l) permits 
an additional powerful test of the model. Depending on the value of c, 
the probability of a correct response to item ^ could even decrease 
for large values of the parameter n . Using the response probabilities 
from Eq. ( 5 * 2 ) and Eq. ( 5 * 8 ) yields 

= 1 - I ( 1 -c)®' . 

The preceding remark can be illustrated by plotting Pr(A2 jjl ^2 jj) for 
n = .95 and c = .5* This is shown in Fig. 




Fig. Probability of correct to Sp for mixed model, 

n = . 9 ^, c = .5O0 
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This concludes the section on the applicability of the framework 
to the mixed model o Of course the framework could be used to get results 
for other miniature lists. To recapitulate the advantages of applying 
the framework to the analysis of the mixed model, we first note that 
properties of the model such as commutativity are utilized by the frame- 
work (Eq. (5.6)). Second, results from a generalized (it I/2) Ber- 
noulli presentation schedule fall directly from Theorem 4.2 (Eq. (5°^))* 
And finally, statistics involving response probabilities to a particular 
stimulus (S^ or S^) are easily obtained (Eqs= (5=9) and (5=10))» 

Of course these results could be obtained without recourse to the frame- 
work, but the compatibility of the framework and the model suggests that 
there are dividends to be gained by an axiomatization of a model in 
terms of Definition 4.3o Next, we turn to £>,n analysis of the all-or-none 
multi-level model for M = 2. 

The All-or-none Multi-level Model (M = 2) 

Assume a list of pairs of related items for which related pairs are 
assigned the same response. For example, 



Stimulus 

ABC 

ADE 

FGH 

FIJ 

KLM 

KNO 



Response 

1 

1 

2 

2 

5 

5 
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might he such a list with three suhlists of size 2 fitting the above 
criterion. In Chapter 5^ the all-or-none multi-level model for learning 
such a list was axiomatized. The analyses of the model in Chapter 5 
were restricted either to general theorems (Theorems 5*1^ 5*2, 5*5) or 
to the special case where c = r (Table 5*2). In this section a fur- 
ther analysis of the model in terms of the framework will be presented. 

For the all-or-none multi-level model (M = 2) we have T^ = {U,P,R], 
= T X T, and { (U,R), (R,U), (P,R), (R,P)3, where is the 

set of null states (Definition 4.8). Thus, the state space for the 
analysis is J = {(u,u), (u,p), (p,u), (p,p), (r,r) 3 . W = 

CP. jlPg ] , where 



(5-11) 



(E/E) 

(P,P) 

(P,U) 

(U,P) 

(U,U) 



(E,E) 

1 

c 

c 

r 

r 



(p,p) 

0 

1-c 

0 

p 

0 



(P,U) (U,P) (u,u) 

0 0 o' 

0 0 0 

1-c 0 0 

0 1-r-p 0 

p 0 1-r-p 



and 



(E,E) (P,P) (P,U) (U,P) (U,U) 



( 5 - 12 ) 






(E,E) 

(P,P) 

(P,U) 

(U,P) 

(u,u) 



r 

c 

r 



0 

1-c 

P 

0 

0 



0 

0 

1-r-p 

0 

0 



0 

0 

0 

1-c 

P 



0 

0 

0 

0 

1-r-p 
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It should be noted that the preceding state space and matrices really 
apply to the sublists consisting of pairs of related items, i.e., 
the levels partition of consists of two-item eq_uivalence classes 

(Theorem 4 . 4 ). For example, the levels partition for the six-item list 
on p. 98 is 

=. (iABC,ADEl, trGH,FIJ), {KLM,KN 0 }) . 

The response rule X asserts that responses to items in state U are 
correct with probability g and incorrect with probability 1 - gj 
whereas, responses to items in states P and R are always correct. 
Finally, the start vector ^ = ( 0 , 0 , 0 , 0 , l). 

The model, is commutative in the sense of Definition 4 . 7 * This 

fact can be seen by noting 





1 


0 


0 


0 


0 




l-(l-c)^ 


( 1 -cf 


0 


0 


0 


(5.13) = P2"Pl = 


c+(l-c)r 


p(l-c) 


(l-c)(l-r-p) 


0 


0 




c+(l-r)r 


0 

1 

1 — \ 


0 


(l-c)(l-r-p) 


0 




1 

OJ 


2 

P 


p(l-r-p) 


p(l-r-p) 


(l-r 



Now to apply Theorem 4 o 5 ; let us assume ^2 ^2 

tions for the first N trials on a related pair of items {3^82}; for 



+ k2 = N. Then 



( 5 . 14 ) 



- 5 ( 4 ) - 43 )i 

^^^'^1,N+1^ ^1,N+1^ “ ^^%+l *'* %+l *** %+l 



f-<2) ^ -<1 )t 

+ sl^+i + Pn+I^ > 
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— ^ i ) 

where p!; ' is the probability of being in state i on trial N + 1 

iv+J- 

after the specified and presentations (we are assigning numbers 

to states as follows: 1~(R,E), 2-(P,P), 3-(P,U), 4-(U,P), 5-(U,U)). 

Now by Theorem 4.3 

^ N+1 Vl *^2 



(5.15) 



= (0,0, 0,1) 



0 



k-i k-| 

l-(l-c) ^ (1-c) ^ 

l-(l-c) 0 



B. 



B, 



A. 



k. 



k. 



l-(l-c) ^ (1-c) ^ 



^1 

(1-c) ^ 



0 

0 



k. 



0 (l-r-p) ^ 



A, 



0 

0 



k. 



B, 



k. 



A^ (l-r-p) 



l-(l-c) 0 



0 
0 
0 

(1-c) 



k. 



B. 



0 

0 

0 



k. 



0 (l-r-p) ^ 



0 

0 

0 

0 



Ao (l-r-p) 



k. 



k k k 

= (^^A^Bg) + (l-r-p) B^, A^A^, A^(l-r-p) , A2(l-r-p) , 



k +k^ 

(l-r-p) ^ b , 



where 



A. = 



i r+p-c 



k -1 k.-l 

^ — {(1-c) ^ - (l-r-p) ^ ), 



and 



^i 

B^ = 1 - A^ - (l-r-p) . 
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The appropriate terms in Eq. (5*15) can he substituted into 

Eq. ( 5 . 13 ) to obtain response probabilities as a function of k^, k^, c, 
p, r. An experiment in which the presentation orders of the and 

items are varied in position and in number should provide a strong test 

for the all-or-none multi-level model. 

Next^ suppose a Bernoulli presentation schedule (Definition k. 6 ) 

with Jt = Pr(S^). Then from Theorem k,2, 



(5.16) = 



Jt 



c 1-c 

jtc+(l-jt)r (l-Jt)p 
jtr+(l-Jt)c Jtp 
r 0 



0 

Jt (l-c) 

+ 

(l-n)(l-r-p) 



Jtp 



jt(l-r-p) 

+ 0 

(l-jt)(l-c) 

(l-it)p 1-r-p 



A could be raised to the power to get state and response proba- 

n 

bilities for trial N. The result will not be presented here. 

The all-or-none multi-level model is a symmetric model (Definition 
4 . 11 ). Hence, Theorem 4.5 can be used to lump into the states 

T^ = l(R,E)), T2=C(P,P)3, Tj = <(P,U), (U,P)J, Ti^ = l(U,U)J- The 

result is 



T. 



T, 



T. 



(5. IT) 



Ai 

2 



T 

1 

T 

2 

T 

•^5 

T. 



l-c 



-|■(r+c) 



t.p l-i(r+c+p) 



0 

0 

1-r-p 
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The response probabilities for this chain are as follows: 







1 


for 


T^ and T2 


(5.18) 


Pr(xjj = OlT^) =< 


i(i+g) 


for 


"5 






g 


for 


^^4 • 



Since T, and Tp are both perfect performance states in Ai, 
there is a simpler equivalent three-state model. This is given by 



W2 state) 



"1 


1 

H 

1 

0 

1 

0 

\ 

i 




1 


(5.19) A:[ = W„ 
2 ^ 


•|-(r+c+p) 1— |-(r+c+p) 0 




|-(l+g) 


^^3 


r p 1-r-p 




g 



where = 

(cf. Bower and Theois, 1964). Analysis is facilitated by expanding W2 

into an error and a success state (see p. 88, Chapter 4). 

Ai. represents the three-state stochastic matrix that corresponds 

to the stochastic model govening the error-success process on the item 

pair for a Bernoulli presentation schedule with n = 1/2, i.e., each 

error-success protocol for the pair of items, S^, S^, is a sample path 

from this process . Thus A^ represents a particular stochastic process 

2 

derived from the all-or-none multi-level model under the boundary con- 
ditions of jt = 1/2 and the level of analy'’is chosen as the pair of 
items. Without dwelling on the point, there is a sense in which the 
framework provides a method for axiomatizing a theory for list learning 



Wg = {Tjl, Wj = (T^). A, is a two- stage model 



in such a way that a particular model can be derived in accord with the 



■boundary conditions of the experiment. This property is a feature of 
theories in physics^ Newtonian mechanics. 

An additional point can be made about a model viewed in terms of the 
framework. The question of whether a theory is identifiable in the sense 
of Greeno and Steiner can not be answered, as such, by models in 

the framework. The Greeno, Steiner analysis concerns the identifiability 
of a model for a particular presentation schedule and a particular level 
of analysis. Thus, a derivation, such as the model represented in 
Eq. (5ol9) for the pair of items, provides a stochastic process (or mod- 
el) which might or might not be identifiable In the sense of Greeno and 
Steiner. However, some additional development of the theory of identi- 
fiability is needed to apply it to a particular model, (o/, 

No attempt to extend identifiability in the indicated direction is pre- 
sented in this paper. 

Similar techniques can be used to handle the anticipation procedure. 

On any cycle, either the presentation order S^Sg or the order SgS^ 
is presented to the subject. Since the model is commutative, the effec- 
tlve matrix of transition probabilities for any cycle is given by Eq. (5-15). 
Since (R>E) and (P,P) are perfect performance states, the effective 

Ts 

0 0 0 

(1-cf 0 0 

p(l-c) (l-o)(l-r-p) 0 

2 

p~ Sp(l-r-p) (l-r-p) , 

L in Eq. This stochastic matrix 



matrix on a cycle is lumpable to 



( 5 . 20 ) 



P, 



^2 

T. 

iS 

T 

T, 



1 

l-(l-c)^ 

ci-(l-G)r 

r(2-r) 



where T^, Tg, T.y T^ are define 
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represents the model for the analysis of the error-success subsequences 
associated with whichever item appears first in a cycle. Thus, if s 
is an error-success sequence for the item pair in an anticipation proce- 
dure, the subsequence corresponding to the even terms in s is an error- 
success sequence for the model in (5*20). The response probabilities, 

given state T^, are presented in Eq. (5*l8)* 

In a similar manner to the way in which Eq. (5*19) represents an 

equivalent model to Eq. (S.IT), a three-state equivalent model (with 

states W, i=l,2,5) to Eq. (5-20) can he derived. The result is 
1^ 



(5.21) 






w. 



= w. 



w. 



w. 



(r+p)(l“c)+c 

r(2-r)*ii)^ 



W. 



(l-c)(l-r-p) 

2p(l-r-p) 



W 

5 

0 

0 

(l-r-p)‘ 



Computations for this model would proceed similarly to those for the 
model in Eq. (5.19).^ The point of interest is that the models in Eqs. 
(5.19) and (5.21) are different models. Each is relevant to a different 
presentation procedure and each applies to a different level of analysis; 
however, both are derived from the all-or-none multi-level model. Next, 
we present a slight modification of the all-or-none multi-level model 
and indicate the direction of an analysis of this model in terms of the 



framework. 

57 It should be reiterated that models derived from Theorem 4.5 generally 
have differs tlal prohahllitles of learning ^ 

in a particular lumped state. This model is no exception. Analysis is 
facilitated hy expanding (5-2l) Into a W error state and a Wg sue- 



cess state. 
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Another Version of the All-or-none Multi-level Model 



Thus far, we have reported an example where response probability to 
an item depends on the states of other items in the list, and an example 
where items other than the one presented can change their states. For 
completeness we mention an extension of the all-or-none multi-level model 
which displays the third type of dependency discussed, namely, the tran- 
sition probabilities for items may depend on the state of a particular 
unpresented item. 

Except for one modification, the model assumes the same structure 
as the all-or-none multi-level model (M = 2). The probability of rule 
learning is assumed to be c for any item presented, provided there is 
at least one item in the list not in state U, otherwise it Is assumed 
to be ro For M = 2 the two members of ^ are displayed below? 



(5.22) 



and 



(5.23) 



IP. 



P. 



(e,e) 


(E,E) 

1 


(p,p) 

0 


(P,U) 

0 


(u,p) 

0 


(u,u) 

0 


(p.p) 


C 


1-c 


0 


0 


0 


(P.U) 


c 


0 


1-c 


0 


0 


(u,P) 


c 


P 


0 


1-c-p 


0 


(u,u) 


r 


0 


P 


0 


1-r-p 


(F,E) 


1 


0 


0 


0 


0 


(P,P) 


c 


1-c 


0 


0 


0 


(P,U) 


c 


P 


1-c-p 


0 


0 


(u,p) 


c 


0 


0 


1-c 


0 


(u,u) 


r 


0 


0 


p 


] -r-p 



r 



This model has a sort of proactive feature to it in the sense 
that previous presentations of other items can affect the probabili- 
ties of rule learning to a particular item. The model is not 
commutative model. For M = 2, this is shown by computin*, 
and IPg'IP^. The result is 





(E,E) 


1 


0 


0 


0 


*1 

0 




(P,P) 




(l-c)^ 


0 


0 


0 


(5.24)P^'|P2 


= (P,U) 


l-(l-c)^ 


p(l-c) 


H 

1 

0 

H 

1 

0 

1 

o 


0 




(U,P) 


l-(l-c)^ 


p(l-c) 


0 


(l-c)(l-c-p) 


0 




(U,U) 


l-(l-r)^ 

+ 

p(c-r) 


2 

P 


p(l-c- 


•r) p(l-r-p) 


(l-r-p)^ 


and 


(E,E) 


1 


0 


0 


0 


0 




(P,P) 


l-(l-c)^ 


(1-c)^ 


0 


0 


0 


(5.25)P^-P^ 


= (P,U) 


l-(l-c)^ 


p(l-c) 


1 

0 

1 


•c-p) 0 


0 




(U,P) 


l-(l-c)^ 


p(l-c) 


0 


(l-c)(l-c-p) 


0 




(U,U) 


l-(l-r)^ 

+ 

p(c-r) 


p^ 


p(l-r- 


■p) p(l-c-p) 


(1-r-p)^ 




These two matrices differ in their (5^5) f^nd (5^^) terms. 

This model, in miniature form, embodies some of the ideas currently 
being worked on by G. Groen and L, Hyman (personal communication). They 
are investigating the assumption that the probability a concept is learned 
on any trial depends on the number of items in the list that have been 
learned as paired associates- The above model reflects these considera- 
tions by setting the concept learning parameter equal to one value if no 
items have been learned as paired associates and a second value if any items 
have been so learnedo Further analysis of this model will not be pre- 
sented in this paper. With the exception of the lack of commutativity, 
the analysis would proceed along the same lines as the all-or-none multi- 
level model. Next we turn to an analysis of Restle's strategy- selection 
theory within the framework of Chapter U , 

Strategy Selection Theory 

Restle's strategy-selection theory (Restle, 19^2, 1964; Poison, 

Restle, Poison, I 965 ) has been mentioned in Chapter 4, p„ 4t and p, 65 
In this section we present one possible interpretation of his theory in 
terms of the framework. As will be seen, there are two reasons why his 
theory is an attractive one to analyze by our methods. The first is 
that it provides a complement to the all-or-none multi-level model. The 
multi-level model has the property that similar stimuli are paired with 
the same response^ whereas, in strategy-selection applications, similar 
stimuli are paired with different responses. Thus, stimulus confusion 
facilitates performance in the former situation and hinders it in the 
latter. The second attraction to analyzing Restle's theory in terms of 
the framework comes from noting that in Restle's applications of his 



o 



108 



theory several approximations are made (Restle, 19^^, pp* 152-144; Poison, 
Restle, Poison, I 965 ). Restle is aware of these approximations and even 
suggests that a completely accurate analysis of his theory would require 
a complicated Markov chain analysis involving the whole set of conf usable 
items and different transition matrices for different items (Restle, 19^4, 
pp. 168-171). The jnethod of dealing with dependent items (in this case 
confusable ones) in the framework appears to he similar to the method of 
analysis Restle had in piind. 

Restle applies strategy- selection theory to a number of experiments. 
In applications, the theory takes the form of a finite state Markov chain. 
The intermediate states of the model involve stimulus confusion or re- 
sponse confusion. Since many of his applications are similar, the main 
points of this section can be made in the context of the Poison, Restle, 
Poison ( 1965 ) experiment. Next, we turn to a description of the experi- 
ment and model reported in that paper. 

In the experiment, college students learned a l6-item paired- 
associate list with 5 response alternatives by the anticipation procedure. 
The stimuli were symbols such as a chess knight, a question mark, and 
musical notes. The responses were common four-letter words. The major 
manipulation was that 8 of the items were highly dissimilar; whereas, 
the other 8 items consisted of 4 highly- confusable pairs, e.g., two very 
similar Chinese words. Confusable stimuli were assigned different re- 
sponses . . , 

The model assumed by Poison, Restle, and Poison had the property 

that unique (non-couf usable) S-R pairs would be learned by a two-stage 
all-or-none model (the one-element P-level model); whereas, the confusable 
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twin items would le learned by a three -sLage models The intermediate 
stage was a stimulus confusion stage. More specifically, the model for 
conf usable pairs assumes three states, S^, S^, S^, where is an 

initial unlearned state with correct responses emitted with probability 
p, is an intermediate confusion state where correct responses are 

made with probability P and confusion responses (incorrect responses 
which would be correct for the twin) with probability Q, and is a 

final learned state. The transition matrix for the model is as follows^ 

n C Q » 

L,n+1 I,n+1 0,n+l Pr (correct | row state) 



Sr 


1 0 0 




1 


L,n 








(5-26) 


Qd 1-Qd 0 




P 


"0,n 


cd c(l-d) 1-c 




P 



where it is understood that transitions take place from to 

only on confusion errors. Thus, c is the probability any strategy is 
selected to an item in state S^, and d is the probability a selected 
strategy is not a confusion one, Resampling of strategies is assumed to 
take place only on errors . The model in Eq., (5»26) is assumed to govern 
the learning of a single confusable item, ioe-, the model was applied to 
a P-level analysis of twinned items in the Poison, Pestle, Poison paper. 

There are several reason why Eq. ( 5 "26) does not adequately embody 
some features of strategy-selection theory. To see these reasons, it 
will be helpful to rewrite the model by expanding the intermediate 
state into an intermediate error state, S^., and an intermediate success 
state, S^. The result is 




no 



.+ 



(5.27) 



Pr (correct! row state) 





10 0 0 




1 


L 










d (l-d)Q (l-d‘)p 0 




0 




0 Q P 0 




1 


So 


cd c(l‘*d)Q c(l-d)P 1-c 




P 



Strategy-selection theory postulates that once a strategy has been sam- 
pled, resampling occurs only on an error . Careful analysis reveals that 
Eq. ( 5 . 27 ) does not represent this assumption in a way that keeps harmony 
with the theory. To see this point, let and denote a pair of 

Qonf usable stimuli. Suppose a confusion strategy, learned when 

S^ appears. By its nature h^ will produce correct responses to S^ 
and errors to Since h^ was learned when S^ appeared, the sub- 

ject is now in state S^ for item S^^ however, only on a trial when 
S^ appears, h^ is tried with an error, and resampling occurs, will a 
transition take place from S^, The error that causes rejection of h^ 
does not take place on a trial when is presented but on a trial when 

S^ appears. But (5*27) assumes that each subject-item protocol is a 
sample path from this learning-only-on-errors model. The error that 
causes learning is not in the protocol for and, thus, learning can 

take place following a success to if an intermediate item 

causes rejection of the confusion strategy learned when was pre- 

viously presented. 

^7 The reader who doubts that our treatment of strategy- select ion is a 
fair interpretation is referred to Restle (19^^), pp. 126-127o Actually, 
it is this stimulus -specific interpretation of strategy sampling that 
this writer finds so attractive about Restle 's theory. 
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A possible way to rectify the state of affairs might be to use the 
model represented in Eq» (5»27) to account for the error-success process 
on the pair i.e., the level of analysis for which pairs of 

items are the units ^ Pestle (l96it, p. 123) suggests this by arguing that 
when stimulus generalization is considered, "the unit of analysis must be 
the subset of related items as learned by a single subjeccc” If the unit 
of analysis for Eq» (5.27) is the item pair, the learning-only -on-errors 
assumption is no longer in question. However, suppose h^ is learned on 
a trial when appears and is rejected when appears an favor of 

a strategy which is unique to What strategy now covers S^? The 

answer is that is thrust back to the unlearned state, Sq, but this 

has zero probability in Eq. (5<>2'7), Poison, Pestle, Poison ( 1965 ^ P» 5^) 
point out this possibility and even note properties of the data to indi- 
cate that such events did happen in their experiment. 

One resolution to these problems would be to change the transition 
probabilities in Eq. (5.27). This solution seems not to be desirable 
since the model already fails to reflect the nature of the intra-pair 
dependencies postulated by strategy-selection theory. A better resolution 
would be to attempt to embody these dependencies in a multi-level model 
written in terms of the framework. This direction is very definitely 
suggested by Restle (1964, pp. 168-171). One possible model embodying 
strategy-selection assumptions for the Poison, Pestle, Poison experiment 
is presented next. 

Suppose the item state space, T^,, is iU,C^,C 2 L.U where U is 
an unlearned state, is a state where a confusion strategy requiring 

response i is held (for i = 1,2), and L is a learned state- After 
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removing null states, the state space for the list is as follows: 



3 = {(U,u) 






Cg), (U,L), (L,U), (L,L)} 


• 


consists of two ma- 


trices. 


IP^ and IP 2 J they are as follows 


(L,L) 


(L,U) 


(U,L) ( 02 , 02 ) (C^,C^) 


(u,u) 


(L,L) 


1 


0 


0 0 0 


0 


(L,U) 


0 


1 


0 0 0 


0 


(5*28) = (U,L) 
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1-C 0 0 


0 


(C2,C2) 
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0 0 1-d 


0 




0 


0 


0 0 1 
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(u,u) 
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cd 


0 0 c(l-d) 


(1-c) 


and 


(L,L) 


(L,U) 


(U,L) ( 02 , 02 ) (Cj_,C^) 


(u,u) 


(L,L) 
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0 0 0 
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(L,U) 
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1-C 


0 0 C 
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(U,L) 


0 


0 


10 0 
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(c„,c„) 


0 


0 


0 10 


0 


(5.29) ^ ^ 












0 


0 


d 1-d 0 


0 


(U,U) 


0 


0 


cd c(l-d) 0 


1-c 



where the following special assumptions have been made: (l) If an item is 
presented and a confusion strategy is learned, it applies equally to both 
items if they were previously unlearned, (2) if one item is learned and 
the other is not, any strategy learned on a trial when the unlearned item 
is presented is sufficient to move the pair into state (L,L), and (3) on 
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an error trial to a confusion strategy^ only the presented item can be 
learned and, if so, its twin goes to state U. These assumptions appear 
to be in the spirit of sr.rategy-selection theory but, by no means, rep- 
resent the only way strategy-selection theory could be formalized in 
terms of the framework. The response rule, would specify that items 

in U would be responded to correctly with probability p, in L with 
probability 1, and in state the response correct for is 

always made. 

is not a commutative model, since ^^’^2 ^^2*^1' 
The results of the matrix multiplications are as follows: 



( 5 . 50 ) 
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cd 
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1 — 1 


d(l-d) 
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cd(l-c) 
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-c-d) 0 ( 
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1-c 
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1-c 
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1-d 
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cd 


(l-d)d 


d(l-c) 


0 




0 


0 

e'^d 


cd(2-c-d) 


cd(l-c) 


0 


e(l-d)(2-c-d) 


(1-c)^ 



The anticipation procedure requires that the two possible orders of pre- 
sentation, S, and S^S, , are equally likely^ In order to apply 
12 ^ i 



[ er|c 
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Theorem 4.2 we compute the average effective matrix, A, on a cycle 



(see Theorem 4.2). 


The result is 










( 5 . 32 ) A = 


— IP 
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•Pg + 1 Pg-P^ 
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cd 
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d(2-c) 
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^( 3 - 2 o-a) 


^(3-2c-a) 


c(l-d) (2-c-d) 


c(l-d)(2-c-d) 


( 1 - 0 ) 
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Since ^ is a symmetric model, Theorem 4.5 can he used to lump A 
to a four- state matrix with states = {(L,L)}, Tg = C(L,U), (U,L)3, 
T^ = t(C2,Cg), (C^,C^)}, - {(U,U)]. The result is 



T. 



( 5 . 53 ) 
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T, 
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T, 
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1-c 
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~ §(3-C-d) 
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r 



1 



The probability of correct given state 



is as follows: 



1 



1 if 

-|(l+g) if 






Pr(xjj = 01 f.) = < 




2 



g 



The model represented by Eqs , ( 5 ^ 53 ), ( 5 * 5 ^) would apply to the 
error-success seq.uences on items or which appear first on 



each cycle (see p.l05 of this chapter for a further description of this 
level of analysis). This is because between two successive first ap- 
pearances, each of the matrices, ^2*^1^ equally likely 

— > 

to be effective . Restle assumes items start in state U, so p^ - 
( 0 , 0 , 0 , 1 ) for this model. Since the data for first -appearing items is 
not presented in Poison, Bestle, Poison, no attempt will be made to 
present statistics for this model. It should yield to hand computations 
of some statistics, or it could be analyzed by computer, using Bernbach s 
(1966) scheme. Intuition suggests that the pattern of predictions for 
this model should conform as well or better to data as the model pre- 
sented in Poison, Restle, Poison There are two reasons for this in- 
tuition: (1) items can drop from a confusion state to state U, and 
there are indications in the data that this happened, and (2) the model 
is an average of a convuj-ution of two geometric distributions and a 
convolution of three geometric distributions 0 Since a convolution of 
two geometries does not do badly, it is unlikely that the addition of 
another stage will hurt prediction. The case is not, however, entirely 



transparent . 



j 



j 
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The model for the error-success process on the second-appearing 
item in a cycle is slightly more complicated. This is because this item 
is always different from the item associated with the last effective 
matrix, i.e., if appears second on some cycle, then corres- 

ponding to S^, which appeared first, is the last effective matrix. 

If one uses the average matrix. A, as in Eq. (5*53); i't is assumed 
not only that ^2*^1 equally likely, but also that 

and appear equally likely and independent of whether Pi*lP2 

p^.^l was effective. This assumption is violated for second-appearing 
items but not for first -appearing items on a cycle. There are several 
ways second-appearing items can be handled, but the details will not be 
presented here. One way would be to consider the arrangements 
and possibilities for effective matrix and 

item-presentation for second-appearing items o By incorporating the 
presented item into the state space (e.g., a state might be (U,L,Sj^)), 
a model for second-appearing items could be derived. 

Additional results and statistics for different presentation 
schedules and levels of analysis could be presented for strategy- selec- 
tion theory as interpretated by the framework. These will not be pre- 
sented in this paper. It is hoped that this section has indicated the 
direction that a mathematical theory for confusion processes in list 
learning might take. This section concludes our analysis of models in 
terms of the framework. We have seen how the theorems of Chapter 4 can 
be applied to a variety of multi-level models embodying various sorts of 
item dependencies. The net value of the framework depends entirely on 
its ability to generate new and tractable tests for learning models. 
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CHAPTER 6 



EXPERIMENTS AND CONCLUSIONS 

In the first part of this chapter we will discuss two experiments 
that the writer has conducted to generate some data relevant to the ideas 
and methods of analyses discussed in Chapters 2 and 5* Since these 
experiments represent only the start of a program to pursue experimentally 
the ideas in those chapters, their presentation has been postponed to 
this last chapter, which is designed to indicate plans for developing 
and extending the ideas in this paper. In the last part of the chapter 
we will indicate briefly some general directions that research motivated 
by the ideas in Chapters and 5 might take. 

Experiments 

Before presenting the two experiments, it will be useful to describe 
the general paradigm that governs the design of bothr. The paradigm in- 
volves list learning. The stimulus terms are composed of recognizable 
components with some number N of these components per stimulus (in 
the experiments to be reported, N = 5)- There are fewer response terns 
than stimulus terms, and hence, more than one stimulus is paired with 
each response. 

Some of the components making up a stimulus are unique in the sense 
that they only appear as components of that stimulus, whereas other com- 
ponents are shared by more than one stimulus. The major manipulation in 
the paradigm is to construct stimuli and assign responses in such a way 
that all stimuli sharing any component (or components) are paired with 
the same response. Thus, shared ("overlap") components should aid the 
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subject in the sense that they will never lead him astray in his responses 
i.e., if the subject pairs a certain component x to response A and 
hence says response A to any stimulus having component he will 

always be correct. The following is a possible structure of a typical 
list used in the experiments to be reported^ 





Stimulus 




Response 


Carl 


Stan 


Eric 


1 


Carl 


Dave 


Robert 


1 


Carl 


George 


Jim 


1 


Jack 


Bill 


Bob 


1 


Jerry 


Dick 


Pat 


5 


Jerry 


Frank 


Louis 


5 


Jerry 


Mike 


Guy 


3 


Tom 


Harry 


Glen 


3 




etCo 


m 





It should be noted that the only overlap components are Carl and Jerry ^ 
and, further, if the subject pairs any component with a number response, 
he will get the stimulus having that component correct as well as any 
other stimulus (if any) sharing that component o 

The list structure for this paradigm is similar to that freg.uently 
employed to study concept identification (ongo, Atkinson, Bower, and 
Grot hers, 19^5, P* 51 )j however, there is one essential difference in 
the two paradigms. This difference is that overlap components in a con- 
cept identification task are not always facilitative^ that is, two stim- 
uli can share a component and yet be assigned different responses o Our 



paradigm is even more different from that employed by Poison, Restle, and 
Polaon (1965) to study confusion processes in paired-associate learning. 
In their study, stimuli sharing common components were always assigned 
different responses (see pp. 109-110 for a discussion of their paradigm). 

By using the paradigm described in this chapter, it was hoped that 
positive inter-item transfer within the list would result from the facll- 
itative nature of the overlap components. As will be seen, this expec- 
tancy was borne out in the data. Addi-'-nal motivations for the experi- 
ments were to gather data relevant to the levels analyses discussed in 
Chapter 2 and possibly to fit the all-or-none multi-level model to these 
data. However, only some of these latter expectancies materialized. 
Next, we turn to a discussion of the two experiments » 



Experiment I 



Method 

Subjects. The Ss were 15 male and female undergraduate and non 

psychology graduate students at Stanford University. Each S was paid 



$1.50 for his participation in the experiment. The data for all Ss were 
used. The initial plan was to run 50 Ss in the experiment; however, 
the task proved so easy that only certain statistics, requiring many less 
than 50 Ss for stability, were usable o 

Apparatus and Materials .-Subjects were run one at a time. Presen- 
tation was by hand. The E sat facing the S behind a 1 x 2 ft. screen 
and placed 5 x 5 Inch cards on a 3 x 8 inch metal card rack situated to 

the E's right of the screen. 

The materials consisted of three decks of twelve stimulus cards. 
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Each card in the experiment was composed of three component words arranged 
ir a triangular fashion on a card, i«e., if x,y,z were the three com- 
ponents, a typical arrangement on a card might he 





y 




X 







The responses for a particular deck were either the numbers (5^ 5^7^95 
or the numbers {2,4, 6,8] . 

The twelve stimulus cards in each deck were partitioned into four 
sets of three stimuli per set. Each set was assigned to a different one 
of the four response numbers . Each set of three stimuli in the experi- 
ment had one of the following three structures: (l) all three stimuli' 
shared exactly one common component word, (2) two of the ttrree stimuli 
shared a common component word, and (5) none of the stimuli shared a 
component word. Denote these three structures by C^, € 2 ^ and Cq, 
respectively. With the exception of the overlap components possible in 
a or C 2 structure, all other components for a particular deck were 

unique, i.e., appeared only on a single stimulus. 

Deck (list) one consisted of animal names as the components, e.g. 
toad, mole, badger, and consisted of 2 and 2 Cq sets. Lists two 

and three had the following structure. One of the lists had a 2 C^, 

1 C 2 ^ 1 Cq structure, and the other list had a 1 C^, 2 ^ 

structure. The components for a particular one of these lists were either 
all common, short, boys' first names, e.g., Jim, Bill, Dick, or common, 
short, girls’ first names, e.g., Patty, Ann, Margie. Each of the two orders 
for presenting the two lists was given to half the Ss. 
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Procedure. — Each S received trainirxg trials on each of the three 
lists. Presentation was by the paired-associate anticipation method. 

The inter-item interval was short, with a mean of about 1 sec. (range 
about ,5 "bo 1.5 see,)’ The break between cycles (randomizations) was 
noticeable and about 5 sec , , and the break between lists was about 2 
minutes. For the first list, Ss were run either to a criterion of one 
errorless cycle through the list or 8 complete cycles -- whichever occurred 
first; however, for lists II and III, they were run to a criterion of 
two errorless cycles. Upon the presentation of a particular stimulus 
card, the S, at his leisure, gave orally one of the four number re- 
sponses; immediately thereafter the E told him the correct number for 
that stimulus . 

The arrangement of components on a card was counterbalanced, both 
for a single S and from S to S, Within a given cycle through the 
list, an overlap component never appeared twice in the same position (this 
was accomplished by having three randomizations of each list available 
to the E) , Finally, to further minimize recognition of the overlap 
components, presentation orders were arranged in such a way that two 
stimuli sharing a common component never appeared adjacent in a cycle. 

Ss were given brief paired-associate Instructions and were told that 
the spatial arrangement of a particular set of component words on a card 
might change from cycle to cycle. Following the third list the S was 
given a paper and pencil task to see how many of the component-number 
pairings he could remember. The § was required to fill 0 response 
number in the blank opposite each component word- 
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Results and Discussion 

This s«:-ction will present results for List I (2 and 2 sets) 

first, followed by the analysis of Lists II and III. The major results 
for List I c'*n be seen from a P~level analysis of the data using some 
of the statistics discussed in Chapter 2 (see ppo l6-l8). By way of 
preview, these statistics are as follows? (1) the learning curve, 

Pr(x^ = l); (2) the mean total errors, E(T), and the mean trial number 

of the last error, E(l)j and the probability distributions of these two 
statistics^ and (5) the probability of an error prior to the last error, 
Pr(x^ = 1 ]l > n), and the probability of an error given error curve, 
P3r(Xn^l = llx^ = l)o These three classes of statistics are presented for 
the stimulus sets and stimulus sets separately. Figure 6,1 

presents P 3 t(x^ = Table 6.1 presents E(T) and E(l), Fig. 6,2 

presents the distributions of T and L^ and Figs. 6.3, 6,4 present 
Pr(x^_j^^ - = l) and = ijL > n)« It should be reiterated that 

these statistics are computed for a P-level analysis. 

First, it is quite evident from the learning curve (Fig. 6.l) and 
from the mean total errors and mean trial number of the last error (Table 
6.l) that C- stimuli (stimuli with an overlap component) were learned 
more rapidly than Cq stimuli » Also, there is evidence that the process 
governing learning produced qualitatively different data from the 

data for C^. In Fig. 6.1, the learning curve is not badly fit by 

an exponential functionj however, tne learning curve for Cq stimuli is 
more S~shaped. This difference could reflect the fact that 8s learned 
to recognize and attend to the overlap components to the detriment of 
stimuli in Cq sets not having these components, A qualitative difference 
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Table 6.1 Mean Total Errors, E(t) 
and Mean Trial Number of Last 
Error, E(L), for and Cq (List 



E(T) 

E(L) 



2. 12 


1 


3.57 

i. ' 


5.25 
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Fig 6e2 Total Error Distributioas, T, and Trial Number of 
Last Error Distributions, L, for C^, C^, List I 
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Figo 6.5 




Figo 6o4 



Figs» 6o3 and 6o4o Pr( error on n prior 
to last error) and Pr (error on n + l| error 
on n), respectively for C, and C- 
(List l) . Data are Lased on^from 83 to 33 
cases (mostly more than 50 ) o 
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in the data for and is also seen in the total error and trial 

number of the Xast error distributions (Fig. 6.2). The distribution 

appears somewhat geometric, although limited data prevent a sharp reso- 
lution of this point. On the other h’.nd, tiie Cq distributions are 
definitely not geometric. 

Thus far, it appears that for we can reject the one -element 

P or R level model and also the all-or-none multi-level model, since 
these models all predict exponential learning curves and geometric T 
and L distributions for the P-level of data analysis (see Tables 2.1 
and 3.2, p. 18 and pc 59 , respectively). Moreover, the Fj"(x^ = 
and Pr(x = ijx = l) curves (Figs. 6.5 and 6 oh) make it unlikely 
that any of these three models could account for data 1 Both curves 

tend to decrease over trials, whereas all three models predict that they 
should be flat. Thus, it appears that processes more complicated than 
all-or-none P and R level mechanisms are needed to account for bhe 
data from List I. 

The picture becomes more complicated in light of the F-level 
analyses. None of these analyses (which will not be given in detail here) 
revealed anything approaching a significant tendency for P-level learn- 
ing (in the sense of Chapter 2) for stimuli o The R-level learning 

curve was essentially flat within a cycle and the P-level error-success 
protocols for showed no notable Intercorrelations (see p 2C and 

p. 21, Chapter 2). This lack of R-level learning could be reflected 

in the rapid learning of C^. stimuli. Thus th© 8 might not have had 

J 

a chance before reaching criterion to manifest signif leant transfer 
effects by these analyses. However, the difference in learning rate of 



and Cq stimuli strongly indicates that the overlap components were 
effective in cutting down errors to stimuli. 

List 1 was designed as a warm-up task for listr, “T and III. It 
was hoped that the S would have a fair idea of the structure of the 
stimulus classes after his encounter with List I, and wo *d therefore 
perform in a stable fashion on Lists II and III. Next, we move to an 
analysis of these two lists o 

Apparently there was no significant differvsnce in the learning 
rate between Lists II and III (the numbers refer to the list the S saw 
gUd^ ^rd, lists are discussed on p. 121). Nor was there any 

tendency to learn the list having structure 2 1 Cg, 1 Cq any 

faster or slower than the 1 Qy 2 Cg, 1 Cq list. There was, how- 
ever, a slight tendency to learn stimuli with boys ’names as components 
slightly slower than stimuli with girls' names. Since the component type 
was randomized both for list order and list type, the data from Lists II 
and III were combined for analysis despite this slight differential learn- 
ing rate on component type. All stimuli, all Cg stimuli sharing a 

component (i.e., Cg stimuli), all Cg stimuli with all unique components 
(i.e., Cg stimuli), and all Cq stimuli were pooled into four classes 
for a P-level analysis. These classes had 135^ 90^ a-nd 90 protocols 
in each class, respectively. 

The P-level learning curves for the four classes are presented in 
Pig. 6.3 and the mean total errors and mean trial number of the last 
error are presented in Table 6.2. Finally, the distribution of the total 
error statistic is presented in Fig. 6 . 6 < Learning was so raf^d for Lists 
II and III that P 3 :(Xj^ = 1|L > n) and Pr(x^_j^^ = ijx^^ - l) were not 
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Fig. 6.6. Total Error Distributions for cJ, Cp, 

for Lists II, III. 3220 
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Table 6o2o Mean Total Errors^ E(T)^ sind 
Mean Trial Number of Last Error, E(L), 
for Cy Cg, C", and Cq (Lists II and III). 





S 


4 


^2 




E(T) 


1.21 


1.50 


1.84 


1.62 


E(L) 


1.46 


1.62 


2.22 


2.07 
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sufficiently stable to warrant their inclusion. Other things being equals 
these two statistics tended to decrease over trials. 

The learning curve analysis (Fig. 6.5) reveals at least two things. 
First ^ stimuli with overlap components were learned significantly fo.ster 
than stimuli without these components. Second, learning was very rapid 
in the experiment with only about 15f° or less errors per trial on and 
beyond trial 5* Closer analysis reveals that the and curves 

drop faster than an exponential function during early trials. This can 
be seen since the first decrement in the error probability was greater 
than 50^, whereas later decrements tended to be less than The Cq 

and Cg curves are more closely approximated by an exponential function. 
It seems possible that the overlap components were both identified and 
paired with responses on the first cycle, whereas they were already iden- 
tified for later cycles and possibly ignored by some Ss. Interviews 
did indicate some conscious ignoring of overlap components by some Ss. 

A section of the R-level learning curve to follow (Fig. 6 . 7 ) bears on 

this recognition and pairing hypothesis. 

The fact that learning was quite rapid for these two 12- item lists 

is even more strikingly seen in Table 6.2. The mean total errors for 

each class was less than two. The total error distributions in Fig. 6.6 

reveal that, in each of the four cases, geometric-like distributions are 

obtained; however, rapid learning and small N make it difficult to 

discriminate between a geometric distribution and one that just drops as 

k increases • These distributions reveal the differential difficulty in 

C, and vs. and C" classes. The fact that Pr(T = O) is 

3 2 u ^ 

greater for C_ than for Cp might indicate more transfer from stimulus 

3 2 
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to stimulus during cycle 1 when three stimuli share a common component as 
opposed to just two. This transfer within cycle 1 is illustrated in the 
R-level learning curve to he presented later (Fig. 6.7)® 

A comparison of the overlap classes and non-overlap classes 
vs. Cq, cp both on their total error distributions (Fig. 6.6) and their 
learning curves (Fig. 6.5) indicate the nature of the learning-to-learn 
effects developed in the experimenb. The List I data indicate that trial 
1 had little direct effect on stimuli, whereas trial 1 had the big- 

gest effect on cutting down errors to Cq stimuli for Lists II and III. 
Also, the Cq total error distribution Is definitely not geometric for 
List I and apparently geometric -like for Lists II and III. These differ- 
ences are attributed to the Ss' increased familiarity with the paradigm 
for Lists II and III, i.e., the S Learned to expect some but not all 
overlap components and to use them. The post-list III recall task indi- 
cated that Ss remembered the component response pairing for 85/° of the 
overlap (rel.evant) components and only about of the irrelevant com- 
ponents (corrected for guessing). Since if was necessary to learn a 
minimum of of the irrelevant components to master the List, this 
measure indicates that not too much learning above the minimum necessary 
took place. 

Another difference between Lists I vs. J.I and III is revealed by 
the R-level analysis. The small number of Ss and few errors prohibit a 
full R-level analysis^ however, there were significantly fewer errors 
made to the 5 appearing stimulus or. cycle 1 than were made to the 

1 ^^ and stimulus in a clussr This fact is shown in Fig, 6.7, 

which presents section of the R-level learning curve corresponding to 

13k 




Pig. 6.7. Section of the R-level Learning Curve for 
C, for Lists II, III. The solid line 
is within a cycle, and the dotted line, 
■between cycles. 
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the first two cycles for Cj stimuli. The large drop from R-trial 2 
to R-trlal 3, without such a drop from trials 1 to 2, is strongly sug- 
gestive of the fact that Ss only recognize the common component on its 
second appearance and then hook the response to it on that trial. Other 
R-level analyses, including the correlation of P-level protocols (see 
p. 21), revealed no additional significant tendencies for R-level learn- 

ing. 

In conclusion, we have seen that a single overlap component can 
result in a highly significant tendency for stimuli sharing that compo- 
nent to he learned faster o Also, we have seen that the way in which 
common components are utilized changes across successive lists| however, 
the simple all-or-none ideas discussed in Chapters 2 and 5 prove unable 
to account for the pattern of results on any of the lists o Finally, a 
portion of the R-level analys'is helped reveal the nature of the process 
explaining the results shown in the P-level analysis o In the hope of 
obtaining more errors, while still retaining the general overlap paradigm 
presented in this chapter, Experiment II was performed to illuminate the 
nature of the overlap facilitative effect discussed in Experiment Ic 



156 




Experiment II 



Method 

The design and procedure for Experiment II was essentially the same 
as that for Experiment I, except for the follow’i’"" .^aifications . Twenty- 
one subjects were run. The data from one S was excluded, since she 
thought that she was supposed to write down the S-R pairs as they 
appeared (she was a native German and had a imited mastery of English). 
The first list had a 2 Cy 2 structure (just as the first list of 

Experiment l)j however, boys' first names were used as the components 
instead of animal names. 

The major departure from Experiment I was to make the second two 
lists have l6 stimuli each. The stimuli were partitioned into ^ sets of 
4 stimuli, and each set had the same structure. The structure for all 
sets of 4 stimuli was that 5 of the 4 stimuli shared a single common com- 
ponent, whereas the 4 consisted of all unique components and provided 
a cor"* 1 for the learning of the three with an overlap component. Denote 

•J 

by the three stimuli which shared a component and by the single 

stimulus with all unique components. Finally, the components for Lists 
II and III were either animal names or names of common American cities, 
i.e., a random one of these two lists would have animal name components 
and the other one names of cities as the components » 

It was hoped that, by increasing the list length from 12 to l6 and 
using the more difficult (established by a pilot study) city and animal 
names, learning would be retardsdo In retrospect, this hope was only 
partially Justified. 
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Results and Discussion 



As was expected^ the data from List I were very similar to the data 
from List I of the preceding experiment » This was anticipated, because 
both lists had a 2 C^, 2 Cq structure.. The single important differ- 

ence (which was expect^;d) was that List 1 for this experiment pro"'ed 
easier than List 1 from the preceding experiment, t^o comprehensive 
analysis of the data from this list is presented here^ the reader is 
referred to the discussion of List I for Experiment I for the major 
qualitative features of the data^ The learning curve for this list, 
however, is presented in Fig^ 6»8c, Figure 6*8 is similar to the learning 
curve for List I (Experiment l) in Figo 6.1^ however, it is not quite so 
S-shapedo Next, we move to the analysis of lists II and III, 

Unfortunately, there was still a learning-to-learn effect from 
List II to List III, and therefore their analysis will be carried out 
separately. This difference was not anticipated, since it did not occur 
measurably from List II to List III in the preceding experiment. Perhaps 
it can be attributed in part to the similarities in structure of Lists 
II and III. Also, the fact that the lists were longer, and hence the S 
got more experience from List II, and the fact ’■hat the warm-up task was 
easier with consequently less experience prior to List II, might have 
contributed to this learning-to-learn effect Even with this necessary 
separation in analyses, there were 2L0 and 80 P-level protocols 

for each list. 

The learning curves for List I and List II are presented in Fig. 6.9^ 
the mean total errors and mean trial number of last error in Table 6.3^ 
the distributions of T and L in Figs i 6-10 and 6-11, respectively, and 
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Fig. 6.8. P-level Learning Curve for Cq and for 

List 1 (Exp. II). 
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Figo 6o9 P-l,QV6l Learning Curves for Cl and C^, 
Lists II, in (Fxp II) 
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Table 6.5. Mean Total Errors, E(T), and 
Mean Trial Number of Last Error, E(L), 
for C^, and Lists II, III • 

List 
II 

II c" 

III 

III c" 



e(t) e(l) 



r 

1.38 


1.79 


1.87 


2.67 


0.93 


1.19 


0 

o\ 


2.33 
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Fig. 6,.10o Distributions of Total Errors, T, for 
and Cy Lists If, III. (Exp. Ll). 
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Distributions of the Q^rial Number of Last Error^ 

L, for and C^y Lists II^ III (Exp. II). 
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Pr(x^ = ijL > n) in Table 6«4, and = ijx,^ = l) in Table 6o5» 

It should be emphasized that these statistics are for a P-level analysis 

of the data from Lists II and III= 

The learning curves in Figo 6.9 shov that learning was much faster 

for cZ stimuli than for C* stimuli = Also, the curves for Lists II 

3 5 

and III demonstrate a fairly striking learning- to-learn effect for 

stimuli, ioCo, stimuli in List III were learned much more rapidly 

/ , 

than they were in List Ili Weither the two nor the List III 

learning curves are expopeiitial in shape o The C^. curves take about 

equal drops in the error probability for the first three trials and the 

List III curve drops much too rapidly from trial i to 2 to be approxi- 

5 

mated by an exponential function o This evidence, as well as other evi- 
dence, suggests that the data would not be fit well by a P- or E-level 
one-element model or the all-or-none multi-level model, since all three 
models imply an exponential P-level learning curve (see Chapter 2, p= l8 
and Chapter 5^ P° 59)^ 

Table 6=5 presents more evidence on the learning-to-learn effect 
and the superiority of C^ over C^ stimuli in learning rate= Much to 
the writer's chagrin, the l6-item list proved remarkably easy for the 
Stanford students, so it is very difficult to undertake any elaborate 
protocol analyses. The E(t) column in Table 6 = 5 shows how few errors 
were actually made to the stimuli. The T distributions in Pig, 6,10 
reveal geometric-like distributions! however, the L distributions in 
Fig, 6,11 seem not to be geometric. Finally, the strongest indicator 
that a model with more than a single stage all-or-none feature is needed 
to account for these data is seen in the tendency for Pr(x^_^^ = Ijx^ = l) 
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